Reputation: 21
I collected tweet data by using Twitter API academic track. One column is a list of dictionary about referenced tweet unique ids like this.
No | Referenced_tweets |
---|---|
1 | [{'type': 'replied_to', 'id': '1212086431889313792'}] |
2 | [{'type': 'quoted', 'id': '1345063319540002817'}, {'type': 'replied_to', 'id': '1345066320761655296'}] |
3 | [{'type': 'retweeted', 'id': '1344718164974833667'}, {'type': 'replied_to', 'id': '1211798476062908422'}] |
I want to transform this data like below.
No | replied_to | quoated | retweeted |
---|---|---|---|
1 | 1212086431889313792 | ||
2 | 1345066320761655296 | 1345063319540002817 | |
3 | 1211798476062908422 | 1344718164974833667 |
If I use "json_normalize", it results in the error message (TypeError: string indices must be integers). How can I do with Python?
Upvotes: 1
Views: 118
Reputation: 10624
Here is one way to do it (let me know if you need explanation of the code):
def f(l):
a={'replied_to':'', 'quoted':'', 'retweeted':''}
x=pd.DataFrame(l)
x=x.set_index('type')
x=x.T
x=x.reset_index(drop=True)
x=x.to_dict(orient='records')
a.update(x[0])
return a
df['Referenced_tweets_2'] = [f(k) for k in df['Referenced_tweets']]
result = pd.DataFrame.from_dict(df['Referenced_tweets_2'].to_list())
print(result)
Output:
replied_to quoted retweeted
0 1212086431889313792
1 1345066320761655296 1345063319540002817
2 1211798476062908422 1344718164974833667
Upvotes: 1