Skip to main content
All CollectionsAccount Configurations
All about Segmentation Rules
All about Segmentation Rules
Updated over a week ago

This article will help you learn about Bureau Works' default segmentation rules and how to change them.



1. Default segmention rules

In Bureau Works, your content must be separated in segments for a several different reasons such as productivity, content analysis, TM matches and more.

When you upload a document to our platform, it undergoes an automatic segmentation process, which considers various parameters. In most cases, the segmentation will follow a natural structure that makes sense from the reader's perspective. However, depending on the file extension and other parameters, Bureau Works may apply different rules for segmentation.

Below are some of the rules we will use to segment the text:

  • Paragraph separators, new line and tab characters

  • Punctuation characters used in various languages, such as periods, question marks, exclamation points, etc

  • Bureau Works will also consider the double-byte period "。"

Below, you can find some of the rules we'll use to avoid text segmentation:

  • Commonly used abbreviations such as Mr., Mrs., Prof., e.g., Vol., etc

  • Month abbreviations like Nov. 12, 2023 and Sept. 5, 2023

  • Number with dots between them such as 2.55, 100.1 and 0.78


2. How to change your segmentation rules

As mentioned earlier, Bureau Works covers the most common scenarios for text segmentation. However, if your use case requires specific handling for text segmentation, you can provide instructions to the platform to tailor it to your needs. For instance, if you have a product name that ends with a dot or use an abbreviation not covered by default, you can add a new rule to our default configuration, and Bureau Works will adapt accordingly.

Our default rule file is not visible from the UI. Feel free to contact our support team for guidance and testing. They will find the best way to configure the segmentation file taking into consideration your needs.

2.1. Account level

In order to change the segmentation rules for your entire account, you must click on "Settings", "My Account", "Segmentation Settings" and then click on "Add new segmentation file":

All about Segmentation Rules - 1.png

Once you've chosen your .SRX file, you'll need to select the source language you want to apply it to, activate the rules, and then click on "Save".

All about Segmentation Rules - 2.png

2.2. Organizational Unit level

You can configure your segmentation rules at the Organizational Unit level as well. It's important to note that this configuration will override any settings made at the Account level.

In order to change the segmentation rules for your Organizational Unit, you must open the one you want to update, click on the "Segmentation Settings" tab and then click on "Add new segmentation file":

All about Segmentation Rules - 3.png

Once you've chosen your .SRX file, you'll need to select the source language you want to apply it to, activate the rules, and then click on "Save".

All about Segmentation Rules - 2.png
Did this answer your question?