Knowing the detailed metrics behind each document you submit for translation is an important step in the translation process. The word count of documents has implications in project pricing, project timelines and helps supply chain teams determine required linguistic resources. As you can see, accurate word counts are essential to the translation process. This is why YBD has adopted GMX-V into it’s software platform.
If word counts are so important, how do translation vendors figure out the metrics of the documents they receive for translation? And how can you be sure, as a requestor, that the word counts are accurate and verifiable? These questions are exactly why YBD has implemented the Globalization Information Metrics Exchange for Volume (GMX-V) as a part of its software platform. Today we will break down what GMX-V is, how it generates reports within documents submitted for translation and what metrics can be collected from the elements within the standard.
What is GMX-V?
GMX-V stands for Global Information Metrics Exchange for Volume. In basic terms, it is a localization industry standard that defines the word counts within an XML document in a non-proprietary and verifiable way. More than that, it allows you to see a very detailed breakdown of word count categories, be they representing text, numeric, format tags or punctuation-based content. This allows for very detailed representation of the content in a localization project. It is this detailed representation (think of it as a summary of metrics of what makes up the document) that contains word count information that can be used by linguists and vendors to better understand pricing, project timelines and resource needs. Since we are discussing verifiable accuracy, it is the non-proprietary nature of the metrics that GMX-V presents that is of most interest.
Agreeing upon word counts of documents for translation
As anyone involved in translation would know, establishing and agreeing upon word counts can be a very tricky if not contentious process. All of the major industry tools provide their own take on these metrics and the results rarely match one-to-one. Even tools from the same provider differ in their word counting over versions of their core product! This can lead to disputes over the content make-up, becoming particularly problematic when project costs are factored in. Project metrics are the cornerstone of accurate financial estimation, and a transparent approach to support this accuracy is where GMX-V is attempting to find its place among other industry standard tools.
What is the basic structure of GMX-V?
GMX-V is based on XML. XML, or eXtensible Markup Language, is a common format that is used to store structured information in a file that’s easy for a human to read, and easy for a computer to read. Most file formats of documents submitted for translation use some subset of the XML standard. Take XHTML for example, which is used to write web pages. XHTML is a subset of the XML standard (we wish HTML followed the XML standard a little closer ourselves). GMX-V is a subset of the XML standard and, as such, has clearly defined specifications and purpose. Here is the purpose of GMX-V as defined in the specification:
- To provide an unambiguous specification for counting words and characters for translation related tasks.
- To provide a rich set of qualifiers to help accurately define the actual translation workload for translation related tasks.
- To provide an XML notation for exchanging Global Information Management metrics for any Global Information Management task whether it entails translation activity or not.
Ultimately, what makes GMX-V so necessary is that during the localization process, if a proper GMX-V analysis is being generated, anyone familiar with the GMX-V specification should be able to understand the metrics that are being counted in their project, from any system. Without such a universal standard in place, it becomes hard to understand every localization tool’s metrics format.
What does GMX-V actually look like?
So what does a GMX-V look like? Once you see the metrics in place, it is actually pretty simple. Here is an example for a single file or resource. If it were for an entire project, there would be a parent resource XML element and above that, a project XML element to allow for more than one file or resource in a project. But to keep things simple, we have used the results from a single file.
An example of GMX-V for a single file/resource.
There are a couple of things to note in this example. If we get past the first element, and into the “stage” element, the stage element allows for the word counts to be maintained for more than one phase in the localization process. For example, the GMX-V specification allows for a user defined value as well as an initial translation phase state, and a final translation phase state. This could be very useful if the word count requirements between translation and translation review are different, or if there were some changes to the content between different project steps.
If we move into the second count-group we can see it has a name: “verifiable”. There are two important groups in the GMX-V spec for count-group: verifiable and non-verifiable. Verifiable metrics are pre-defined metrics that can be pulled from an XML Localization Interchange File Format (XLIFF) document. Non-verifiable metrics, on the other hand, are metrics that require manual counting to ensure accuracy. Take for example words in an image. Are these words verifiable or non-verifiable? Since words in an image are difficult for a computer to count, they are words that needs to be verified by a person. Therefore, they are in the non-verifiable group.
Inside the count-group is the most exciting section of the XML called the “count type”, which is the bread and butter of GMX, and maintains all the different counts for all the different count types (word counts, character counts, etc.). In the example above there are three types of counts. These counts are the metrics displayed by GMX-V, pulled from the XLIFF data of documents for translation. The metrics have been obtained using a standard and verifiable system (GMX-V) and therefore proven to be accurate.
A call for a universal standard in localization metrics
Since YBD uses the GMX-V standard, you can be sure that the word counts generated for your submitted documents are accurate, no matter the language used for source and target language. How does the GMX-V standard work in a real word scenario? In part two of this blog, we will break down a real life example of a segment and its respective word count, derived using GMX-V. In that post you will also learn about how this process works for logographic based languages such as Chinese and Japanese. We hope this post gives you peace of mind that the word counts and other metrics from your translation projects are accurate, verifiable and justified by a universal standard in the localization industry.