Reputation: 3
I am new to UIMA Ruta and Eclipse. Maybe some of you dealt with making an annotation. Please, tell me what the word "dictionary" means in such a context. Thanks in advance!
Upvotes: 0
Views: 101
Reputation: 1054
In Uima Ruta, Dictionary means Wordlist or Wordtable.
WORDLIST:
WORDLIST FirstNameList = 'FirstNames.txt';
DECLARE FirstName;
Document{-> MARKFAST(FirstName, FirstNameList, true, 2)};
This rule annotates all first names listed in the list 'FirstNameList' within the document and ignores the case, if the length of the word is greater than 2.
WORDTABLE:
WORDTABLE TestTable = 'TestTable.csv';
DECLARE Annotation Struct(STRING first);
Document{-> MARKTABLE(Struct, 1, TestTable, true, 4, ".,-", 2, "first" = 2)};
In this example, the whole document is searched for all occurrences of the entries of the first column of the given table 'TestTable'. For each occurrence, an annotation of the type Struct is created and its feature 'first' is filled with the entry of the second column. Moreover, the case of the word is ignored if the length of the word exceeds 4. Additionally, the chars '.', ',' and '-' are ignored, but maximally two of them.
When we need to use multiple wordlists - use TRIE action to increase the performance of the process.
Document{->TRIE("FirstNames.txt" = FirstName, "Companies.txt" = Company,'Dictionary.mtwl', true, 4, false, 0, ".,-/")};
Here, the dictionary 'Dictionary.mtwl' that contains word lists for first names and companies are used to annotate the document. The words previously contained in the file 'FirstNames.txt' are annotated with the type FirstName and the words in the file 'Companies.txt' with the type Company. The case of the word is ignored if the length of the word exceeds 4. The edit distance is deactivated. The cost of an edit operation can currently not be configured by an argument. The last argument additionally defines several chars that will be ignored.
Upvotes: 0
Reputation: 547
In the context of UIMA Ruta, a dictionary is nothing else than a word list: an external resource used to quickly annotate text items declared in the named resource. Here is an example:
WORDLIST FirstNameList = 'FirstNames.txt';
DECLARE FirstName;
Document{-> MARKFAST(FirstName, FirstNameList)};
For more information, please refer to the documentation.
Upvotes: 4
Reputation: 10797
I think that what you're actually referring to UIMA's Dictionary Annotator. Basically, it annotates words in documents with their dictionary entries. For details, see the User Guide.
Upvotes: 0