Reputation: 491
I'm Working on my current graduating project which is Named Entity Recognition For Turkish. The recognizer should catch Turkish words when i work with Person Names and Locations (Sometimes locations can be in different Languages, for example Hilton Hotels in Taksim/Istanbul) all i need add "Hotel" in my dataset which is Full of specific location tags like Hotel , Restaurant or Mall. But when its come to Organization Name Tag. I need to find a good dataset of bands , products , company names, But cant figure out how to find or collect this dataset
In stanford nlp tool : http://nlp.stanford.edu:8080/ner/process
When i type Facebook , Nike , Adidas etc it can find it's organization. So is there any way to have that organization name Dataset ?
Upvotes: 1
Views: 5323
Reputation: 449
try collecting them from wikipedia. Its a massive source. You can write a parser that collects the information of specific types of entities from wiki dumps. Wikipedia has a hierarchical structure of categorizing people, locations and organizations.
Upvotes: 1
Reputation: 646
If you are interested in a data resource with these organizations names. You can use one of the knowledge bases KBs available such as
All of them have names of these organizations and more, you will need some effort to extract the organizations only using their types. For example, YAGO has downloadable file with possible entities and their types. You can filter it on and then you can use hasMeaning data to get all possible names.
Yago and BabelNet have been used to NER or Named Entity Disambiguation system AIDA and Babelfy.
AIDA offers a robust dataset of possible Entity Names, that can be used for NER.
Upvotes: 4