Reputation: 13
I am trying to stream twitter to elasticsearch. I am having no problems if i do not create any index before streaming, but in such a way i can't filter by date and create timelines. I tried to use this mapping:
https://gist.github.com/christinabo/ca99793a5d160fe12fd9a31827e74444
that allegedly allows for "date" to be correctly picked by ES, but i receive this error when creating the index:
"type": "illegal_argument_exception", "reason": "unknown setting [index.twitter.mappings._doc.properties.coordinates.properties.coordinates.type] please check that any required plugins are installed, or check the breaking changes documentation for removed settings"
what's wrong?
thanks
Upvotes: 0
Views: 365
Reputation: 113
After reading the date field comment, that date format of the tweets is not one of the default formats supported.
For ElasticSearch to understand that as a date field you should specify a custom mapping for the twitter_stream
index, where you tell what date format you are expecting for the tweets date field. The syntax that explains the customizable date formats is here.
So, if you are using Elasticsearch 7.X, you can specify the custom format this way:
PUT twitter_stream
{
"mappings": {
"properties": {
"YOUR_TWEETS_DATE_FIELD": {
"type": "date",
"format": "EEE LLL dd HH:mm:ss Z yyyy"
}
}
}
}
You can copy and execute the above configuration to the Kibana dev tools console. Then, try running the pyhton script again. To explain the letters used in the format:
E day-of-week text Tue; Tuesday; T
M/L month-of-year number/text 7; 07; Jul; July; J
d day-of-month number 10
H hour-of-day (0-23) number 0
m minute-of-hour number 30
s second-of-minute number 55
Z zone-offset offset-Z +0000; -0800; -08:00;
y year-of-era year 2004; 04
Also, there is no need to define anything else in the mapping. ElasticSearch will define the rest of the fields and types with its dynamic mapping.
Upvotes: 0
Reputation: 13
I am running this python script. Before, I have tried to set the mapping from Dev Tools with PUT twitter_stream. Sorry for the terrible indentation!
es = Elasticsearch("https://admin:admin@localhost:9200",
verify_certs=False)
es.indices.create(index='twitter_stream', ignore=400)
class StreamApi(tweepy.StreamListener):
status_wrapper = TextWrapper(width=60, initial_indent=' ',
subsequent_indent=' ')
def on_status(self, status):
json_data = status._json
es.index(index="twitter_stream",
doc_type="twitter",
body=json_data,
ignore=400
)
streamer = tweepy.Stream(auth=auth, listener=StreamApi(), timeout=30)
terms = ['#assange', 'assange']
streamer.filter(None,terms)
Upvotes: 0