Reputation: 820
I'm currently scraping some user//follower information from the Twitter API using Tweepy. I'm currently storing the data as a a dictionary where every key is a unique twitter user and the values are a list of ID's for their followers.
The data looks like this:
{'realDonaldTrump': [
123456,
123457,
123458,
...
],
'BarackObama' : [
999990,
999991,
999992,
...
]}
What I need is a dataframe that looks like this:
user follower
realDonaldTrump 123456
realDonaldTrump 123457
realDonaldTrump 123458
... ...
BarackObama 999990
BarackObama 999991
BarackObama 999992
... ...
I've already tried:
df = pd.DataFrame.from_dict(followers)
but it gives me a new column for each key, and doesn't handle uneven length of follower lists.
Is there a smart way to convert the dictionary structure I have into a dataframe? Or should I store the initial data differently?
Upvotes: 1
Views: 306
Reputation: 71
import pandas as pd
followers = {
'realDonaldTrump': [123456, 123457, 123458],
'BarackObama': [999990, 999991, 999992]
}
df = pd.DataFrame()
i = 0
for user in followers:
for r in followers[user]:
df.loc[i, 'user'] = user
df.loc[i, 'record'] = r
i = i + 1
print(df)
Result:
user record
0 realDonaldTrump 123456
1 realDonaldTrump 123457
2 realDonaldTrump 123458
3 BarackObama 999990
4 BarackObama 999991
5 BarackObama 999992
Upvotes: 1
Reputation: 940
Create a compatible dict:
final_dict = {'users':[], 'followers':[]}
for key in followers:
for i in range(len(followers[key])):
final_dict['users'].append(key)
final_dict['followers'].append(followers[key][i])
df = pd.DataFrame.from_dict(final_dict)
Output:
users followers
0 realDonaldTrump 123456
1 realDonaldTrump 123457
2 realDonaldTrump 123458
3 BarackObama 999990
4 BarackObama 999991
5 BarackObama 999992
Upvotes: 1
Reputation: 862441
Use list comprehension for tuples and pass to DataFrame constructor:
followers = {'realDonaldTrump': [
123456,
123457
],
'BarackObama' : [
999990,
999991,
999992
]}
df = pd.DataFrame([(k, x) for k, v in followers.items() for x in v],
columns=['user','follower'])
print (df)
user follower
0 realDonaldTrump 123456
1 realDonaldTrump 123457
2 BarackObama 999990
3 BarackObama 999991
4 BarackObama 999992
Upvotes: 1