Reputation: 13
I have a Pandas DataFrame which looks like this:
Time Image_names
0 [a,b,c,d]
0 [a,c,d,e]
0 [c,d,e,f]
1 [e,f,g,h]
1 [f,g,h,i]
What I wish to obtain: All unique image names for a given Time
Time Image_names
0 [a,b,c,d,e]
1 [e,f,g,h,i]
I'm not sure if I have to use groupby or joins.
T
Upvotes: 1
Views: 108
Reputation: 164673
One way is to use itertools.chain
:
from itertools import chain
import pandas as pd
df = pd.DataFrame({'Time': [0, 0, 0, 1, 1],
'Image_names': [['a', 'b', 'c', 'd'],
['a', 'c', 'd', 'e'],
['c', 'd', 'e', 'f'],
['e', 'f', 'g', 'h'],
['f', 'g', 'h', 'i']]})
df = df.groupby('Time')['Image_names'].apply(chain.from_iterable).map(set).reset_index()
# Time Image_names
# 0 0 {c, a, f, d, e, b}
# 1 1 {g, h, f, e, i}
Explanation
chain.from_iterable
joins the lists from each group into one large list for each group.set
then creates a set for each group.reset_index
ensures the result is a dataframe with column headers as required.Upvotes: 1
Reputation: 355
You can use the following:
import pandas as pd
import numpy as np
a=pd.DataFrame([[0,['a','b','c','d']],[0,['a','c','d','e']],
[0,['c','d','e','f']],[1,['e','f','g','h']],
[1,['f','g','h','i']]],
columns=['Time','Image_names'])
a.groupby('Time')['Image_names'].sum().apply(np.unique)
#Out[242]:
#Time
#0 [a, b, c, d, e, f]
#1 [e, f, g, h, i]
#Name: Image_names, dtype: object
Upvotes: 0
Reputation: 323226
You can using set
s=df.groupby('Time',as_index=False).Image_names.sum()
s.Image_names=list(map(set,s.Image_names))
s
Out[2034]:
Time Image_names
0 0 {b, c, d, a, f, e}
1 1 {g, h, f, i, e}
Upvotes: 1