Michael S Kim
Michael S Kim

Reputation: 21

Python: list of dictionary to pd dataframe (Twitter API)

I collected tweet data by using Twitter API academic track. One column is a list of dictionary about referenced tweet unique ids like this.

No Referenced_tweets
1 [{'type': 'replied_to', 'id': '1212086431889313792'}]
2 [{'type': 'quoted', 'id': '1345063319540002817'}, {'type': 'replied_to', 'id': '1345066320761655296'}]
3 [{'type': 'retweeted', 'id': '1344718164974833667'}, {'type': 'replied_to', 'id': '1211798476062908422'}]

I want to transform this data like below.

No replied_to quoated retweeted
1 1212086431889313792
2 1345066320761655296 1345063319540002817
3 1211798476062908422 1344718164974833667

If I use "json_normalize", it results in the error message (TypeError: string indices must be integers). How can I do with Python?

Upvotes: 1

Views: 118

Answers (1)

IoaTzimas
IoaTzimas

Reputation: 10624

Here is one way to do it (let me know if you need explanation of the code):

def f(l):
    a={'replied_to':'', 'quoted':'', 'retweeted':''}
    x=pd.DataFrame(l)
    x=x.set_index('type')
    x=x.T
    x=x.reset_index(drop=True)
    x=x.to_dict(orient='records')
    a.update(x[0])
    return a

df['Referenced_tweets_2'] = [f(k) for k in df['Referenced_tweets']]

result = pd.DataFrame.from_dict(df['Referenced_tweets_2'].to_list())
    
print(result)

Output:

            replied_to               quoted            retweeted
0  1212086431889313792
1  1345066320761655296  1345063319540002817
2  1211798476062908422                       1344718164974833667

Upvotes: 1

Related Questions