Reputation: 191
So i have a dataframe with two columns: artistID and genre:
artistID genre
0 52 rock
1 63 pop
2 73 salsa
3 94 reggaeton
4 6177 rock
5 64 salsa
6 862 metal
7 52 pop
8 63 hiphop
9 64 jazz
10 52 metal
11 63 electro
12 73 latino
13 94 trap
14 6177 pop
15 64 latino
15 456 hiphop
And what I want to do is to group by the column artistID (so the resulting datafdrame has as many rows as artistID there are in this dataframe), and the second column of the new dataframe I want it to be like a list or an array or whatever it is convenient of all the unique genres to which each artistID has been taged. So I want the resultind dataframe to look like this:
artistID genre
0 52 [rock, pop, metal]
1 63 [pop, electro, hiphop]
2 73 [salsa, latino]
3 94 [reggaeton, trap]
4 6177 [rock, pop]
5 64 [salsa, jazz, latino]
6 862 [metal]
7 456 [hiphop]
How can I do this ?
I also must say, this dataframe is just an wxample, my real dataframe has almost 200.000 rows and 20.000 different artistID
Upvotes: 0
Views: 1829
Reputation: 34046
Use Groupby.agg
:
In [2237]: df.groupby('artistID')['genre'].agg(set).reset_index()
OR:
In [2240]: df.groupby('artistID')['genre'].apply(lambda x: list(set(x)))
Out[2237]:
artistID genre
0 52 [rock, pop, metal]
1 63 [pop, hiphop, electro]
2 64 [salsa, jazz, latino]
3 73 [salsa, latino]
4 94 [reggaeton, trap]
5 456 [hiphop]
6 862 [metal]
7 6177 [rock, pop]
Upvotes: 1
Reputation: 13821
I think what you need is:
df.groupby('artistID').agg(list).reset_index()
artistID genre
0 52 [rock, pop, metal]
1 63 [pop, hiphop, electro]
2 64 [salsa, jazz, latino]
3 73 [salsa, latino]
4 94 [reggaeton, trap]
5 456 [hiphop]
6 862 [metal]
7 6177 [rock, pop]
Or even better, and to get new dataframe:
res = df.groupby('artistID',as_index=False).agg(list)
print(res)
Out[1632]:
artistID genre
0 52 [rock, pop, metal]
1 63 [pop, hiphop, electro]
2 64 [salsa, jazz, latino]
3 73 [salsa, latino]
4 94 [reggaeton, trap]
5 456 [hiphop]
6 862 [metal]
7 6177 [rock, pop]
Upvotes: 1