This page explains the classification mechanism.
The classification mechanism can be used to classify documents in the Xillio Insights database. By default Xillio Insights already uses the mechanism to classify documents for their category and sizeDistribution.
The classification configuration is stored in the classification.json
file. The configuration can be extended for additional classifications.
Some examples of how the classification mechanism can be used:
- Department classification
- Special folder classification
- Privacy classification
How it works
The classification mechanism uses the idea of reverse querying. Instead of writing queries to retrieve different kind of documents. You write and save the queries into the database and then send the documents to the database to find out which predefined queries match.
In Elasticsearch terminology this is called percolation.
Configuration
Manual configuration
You can add additional classification or change the existing ones by editing the classification.json
file.
After changing the configuration you have to re-run the init and analysis. After running the analysis you have to refresh the insights_default*
index pattern.
Hierarchy classification generator
One common analysis scenario is grouping certain hierarchies to an organisation unit, for example a department. You can utilize the classification mechanism for this and to make it even easier you can use the hierarchy classification generator
.
How to use
The hierarchy classification generator works by populating the hierarchyMapping.xlsx
that can be found in the configuration folder.
By default there is only one sheet called classificationType
. Change it's name to the type of classification your are going to do. In this example department
. You can find more information on classification types here.
You can create more sheets if you need to do multiple classifications.
The hierarchy
column needs to contain the hierarchy value found in Insights. Classification is automatically done for all the children as well. The resultValue
column needs to contain the result of the classification. In this example the department name.
Running the generator
When you are done populating the Excel file you can run the actual generator by running /classificationGenerator/hierarchy.xill
The hierarchy classification generator adds one or more classification types to the classification.json
. If the classification type already exists it will be overwritten.
After running the generator you have to re-run the init and analysis. After running the analysis you have to refresh the insights_default*
index pattern.