Reputation: 1192
I have a df like so:
df = pd.DataFrame({'Info': ['A','B','C', 'D', 'E'], 'Section':['1','1', '2', '2', '3']})
I want to be able to create a new column, like 'Unique_Info', like so:
df = pd.DataFrame({'Info': ['A','B','C', 'D', 'E'], 'Section':['1','1', '2', '2', '3'],
'Unique_Info':[['A', 'B'], ['A', 'B'], ['C', 'D'], ['C', 'D'], ['E']]})
So a list is created with all unique values from the Info column, belonging to that section. So Section=1, hence ['A', 'B'].
I assume groupby
is the most convenient way, and I've used the following:
df['Unique_Info'] = df.groupby('Section').agg({'Info':'unique'})
Any ideas where I'm going wrong?
Upvotes: 1
Views: 40
Reputation: 34086
Use df.merge
and df.agg
:
In [1531]: grp = df.groupby('Section')['Info'].agg(list).reset_index()
In [1535]: df.merge(grp, on='Section').rename(columns={'Info_y': 'unique'})
Out[1535]:
Info_x Section unique
0 A 1 [A, B]
1 B 1 [A, B]
2 C 2 [C, D]
3 D 2 [C, D]
4 E 3 [E]
Upvotes: 1
Reputation: 150815
df.groupby().agg
returns a series with different indexing, which is the Section
number. You should use map
to assign back to your dataframe:
s = df.groupby('Section')['Info'].agg('unique')
df['Unique_Info'] = df['Section'].map(s)
Output:
Info Section Unique_Info
0 A 1 [A, B]
1 B 1 [A, B]
2 C 2 [C, D]
3 D 2 [C, D]
4 E 3 [E]
Upvotes: 2