Bureau Works TMill smartly cleans and updates your old Translation Memories, using semantic intelligence to improve consistency, accuracy, and usability across your projects.
The Scenario
Translation Memories (TMs) degrade and depreciate over time. Large TMs often suffer from a wide range of issues, including but not limited to:
Major Severity Issues
Mismatch between declared locale and actual language in the Translation Unit Variant (TUV).
Example: TUV labeled as PT-BR contains Korean text.Incorrect translations in TUVs that significantly deviate from the source meaning.
Minor Severity Issues
Tag mismatches between source and target TUVs.
Terminology mismatches between TUVs and the associated glossary.
Spelling errors.
Grammar mistakes.
Outdated tone (e.g., formal vs. informal).
Culturally inappropriate language (severity may vary depending on context).
The Challenge
Most of these problems can’t be detected through syntactic validation alone. They require semantic analysis, which has traditionally been too costly to apply at scale. As a result, degraded TMs have typically been handled either by applying TM-wide penalties—leading to reduced leverage and economic loss—or by tolerating the continuous propagation of errors, which hurts translation quality.
The Solution: Bureau Works TMill
TMill is a semantic Translation Memory clean-up tool powered by the Bureau Works ML Tech Stack. Using a combination of models—including GPT-3.5 and 4—alongside proprietary NLP tools and methodologies, TMill can clean TMs of any size, locale, or condition.
Requirements: TMX files
Minimum processing unit per request: 1,000,000 TUVs
The TMill Methodology
Stage 1: Asset Preparation
Upon receiving TMX files from the client, we perform integrity checks to ensure they are ready for ingestion. We verify:
Locale consistency
Absence of a high number of empty TUVs
Sound file structure
Any critical issues are flagged before proceeding.
Stage 2: Initial Analysis
Once validated, assets are processed by our TMill engine. The engine scans all TUVs for the error types listed above and produces a detailed report, including:
Total number of TUVs analyzed
Number of TUVs per error category
Note: If a TUV falls into multiple error categories, it will be grouped under the highest-severity error.
Stage 3: Joint Strategy Definition
Based on the analysis, Bureau Works will provide clean-up recommendations and collaborate with the client to define the final strategy.
Some errors may be flagged as non-recoverable and removed, while others can be corrected. The final plan is shaped by specific use-case requirements.
Stage 4: Clean-Up Execution
The engine reprocesses all segments, categorized by error type, and applies best-attempt fixes.
Some fixes (e.g., spelling or grammar) are highly accurate, while others (e.g., tag corrections) may depend on file structure and locale specifics.
Stage 5: Packaging & Deliverables
The cleaned TMs are reassembled according to the knowledge management strategy defined in the joint planning phase.
TMs can be:
Combined into a master file.
Segmented by error type, author, time frame, or other relevant metadata
These outputs serve as a foundation for a more robust, strategic TM management approach.
Want to learn how TMill can help clean up and optimize your Translation Memories? Schedule a conversation with our team and discover how semantic analysis can unlock greater value from your legacy assets.