Skip to main content
Bureau Works TMill
Updated over a week ago

Abstract:

Bureau Works TMill is a state if the art semantic Translation Memory clean-up mechanism that can ensure your legacy Translation Memories become fresh and as good as new.

Scenario:

Translation memories degrade and depreciate over time. Large translation memories tend to have a wide array of problems including but not limited to:

DALL·E 2023-11-14 20.36.30 - A visual metaphor for a 'sick' translation memory, depicting an anthropomorphized digital memory bank with a sad face on its screen. The memory bank i.png

Major Severity

- discrepancy between language and locale and TUV. e.g. TUV says PT-BR but the translated text in in Korean.

- incorrect translations in TUVs that deviate significantly from source meaning

Minor Severity

- tag mismatches between source and target Translation Unit Variants (TUVs)

- terminology mismatches between TUV and glossary

- spelling mistakes

- grammar mistakes

- out of date tone e.g. Formal Vs. Informal

- cultural inappropriateness (could be major depending on the case)

The Challenge:

Most of these issues cannot be picked up by purely syntactical verification. They require semantic analysis. Semantic analysis used to be prohibitively expensive. As a result over time, as Translation Memories degraded through time and scale, the only solution was to apply TM-wide penalties which resulted in negative economic impact due to leveraging loss, or to live with the endless propagation of errors which negatively impacted translation quality.

The Solution: Bureau Works TMill

TMill is a semantic Translation Memory clean-up tool that leverages Bureau Works' ML Tech Stack. Using a variety of models including GPT-3.5 and 4 allied with our NLP tools and methodology, our tools can clean up TMs of all sizes, locales, and health.

Requirements: TMX files

Minimum processing unit per request: 1,000,000 TUVs

DALL·E 2023-11-14 20.36.01 - Illustration of a futuristic robot with sleek, silver and blue panels, engaged in the process of scanning multiple floating digital pages filled with .png

The TMill Methodology

Stage 1: Preparation of Assets

Once we have received client assets in TMX format we perform file integrity checks to ensure the content is ingestion-ready. This means that there are no major discrepancies between locales, no significant numbers of empty TUVs, no file structure issues or other red flags can be spotted with simple checks.

Stage 2: Initial Analysis

Ingestion-ready assets are processed through our TMill engine. Our engine will read and check all TUVs for the error types identified above and produce a detailed report. This report will outline per locale:

- the total number of TUVs analyzed

- the number of TUVs in each error category

Note: If a TUV contains multiple error categories we will group it in the error category of highest severity

Stage 3: Joint Strategy Definition

Based on the initial analysis, Bureau Works will make clean=up recommendations and also listen to client input to arrive at a final desired scenario. Certain error types may be considered too toxic and requested to be removed for instance while others may be considered fixable. It's a matter of making use-case-sensitive decisions.

Stage 4: Fixing

At this stage, Bureau Works TMill will reingest all of the segments, stratified according to the error type, and implement the best attempt at fixing the identified issues. Some issues have nearly flawless best-attempts such as proofreading while others such as tag-fixing are more subject to file-structure and locale dependencies.

Stage 5: Packaging and Deliverables

TMs will be reassembled according to the knowledge management hierarchy defined by the join strategy. TMs can be all regrouped together, divided by error kind, authors, time frames, and other relevant metadata that can be used to further guide the overall TM management strategy.

Did this answer your question?