Reputation: 81
We have a requirement where we need to track the “Address“ data in the unstructured document using Apache UIMA. Address can be from any geography. Some of the sample Address of UK geography are as below.. 190 Stanley road Llanddoged Conwy LL26 6CM 227,Sankey street,Bourne,Lincolnshire,PE10 1LW
It would be helpful if you can share the possible annotation for identifying the Address Data from an unstructured document.
Upvotes: 1
Views: 171
Reputation: 3113
There are two approaches (examples refer to UIMA-specific tools):
What approach is best for you depends on your requirements. Many people think that statistical models are superior to rule-based approaches in general. However, it's sometimes faster to write some rules than to annotate enough examples.
(I am a developer of UIMA Ruta)
Upvotes: 1
Reputation: 16491
I recommend you use the RUTA workbench to write rules to extract addresses. It will really speed up and ease your work with UIMA.
Upvotes: 1