Reputation: 112
I have a few ndjson files containing twitter data. I ran into the problem, that for retweets the "text" property of the dictionaries only includes the first 140 characters. I would like to extract the full tweet.
By taking one of the tweets and running the following code:
data.get('includes')['tweets']
I get the following result.
[{'attachments': {'media_keys': [‘’1234”]},
'author_id': “1234”,
'conversation_id': “1234”,
'created_at': '2021-02-10T14:27:19.000Z',
'entities': {'annotations': [{'end': 111,
'normalized_text': 'Scotland',
'probability': 0.9519,
'start': 104,
'type': 'Place'}],
'hashtags': [{'end': 50, 'start': 35, 'tag': 'ChineseNewYear'}],
'urls': [{'display_url': 'pic.twitter.com/1234’,
'end': 221,
'expanded_url': ‘urlwuhuu,
'start': 198,
'url': “another one”}]},
'id': “1234”,
'lang': 'en',
'possibly_sensitive': False,
'public_metrics': {'like_count': 7,
'quote_count': 0,
'reply_count': 6,
'retweet_count': 3},
'reply_settings': 'everyone',
'source': 'Twitter Web App',
'text': “FULL TWEET THAT I WANT TO GET”}]
The problem is that what I have now is a list and not a dictionary. To get the tweet (at the end of the list) I cannot use the .get function or index using string.
What is the best way to go about this?
Upvotes: 1
Views: 257
Reputation: 547
How about using list comprehension, for example:
tweets_list = data.get('includes')['tweets']
tweet_texts = [ tweet['text'] for tweet in tweets_list ] # gets the texts of all tweets, as a list
text = tweet_texts[0] # get “FULL TWEET THAT I WANT TO GET” from your example
Upvotes: 1