Reputation: 113
I have more than 100o document in json format (tweets). I have to extract hashtags from these documents. I am reading this file through mongodb-java driver.
entities=Document{
{
urls=[
],
hashtags=[
Document{
{
indices=[
89,
104
],
text=Hungry4Science
}
},
Document{
{
indices=[
105,
112
],
text=ASCO16
}
}
]}}
I have to get text from this structure then I will insert into my mongo collection. Each tweet has hashtag entity but I cant read the lower level objects.
Document hash = (Document)old_status.get("entities");
new_status.append("hastags", hash.get("hashtags"));
Instead of getting text, I got whole document as my output:
hashtags=[
Document{
{
indices=[
73,
80
],
text=cancer
}
},
Document{
{
indices=[
81,
90
],
text=moonshot
}
},
Document{
{
indices=[
125,
133
],
text=pallonc
}
}
]
I tried like this but no luck. Any help please.
Upvotes: 0
Views: 710
Reputation: 113
Document entity = (Document)old_status.get("entities");
ArrayList<Document> hashlist =(ArrayList<Document>) entity.get("hashtags");
ArrayList<String> hashtaglist = new ArrayList<String>();
for(Document hashtag:hashlist){
String g = hashtag.getString("text");
hashtaglist.add(g);
}new_status.append("hashtags",hashtaglist); collection.insertOne(new_status);
This program gets all text object from hashtag and save into arraylist!!!
Upvotes: 1