pymat
pymat

Reputation: 1192

Create a new column in a dataframe, based on Groupby and values in a separate column

I have a df like so:

df = pd.DataFrame({'Info': ['A','B','C', 'D', 'E'], 'Section':['1','1', '2', '2', '3']})

I want to be able to create a new column, like 'Unique_Info', like so:

df = pd.DataFrame({'Info': ['A','B','C', 'D', 'E'], 'Section':['1','1', '2', '2', '3'],
'Unique_Info':[['A', 'B'], ['A', 'B'], ['C', 'D'], ['C', 'D'], ['E']]})

So a list is created with all unique values from the Info column, belonging to that section. So Section=1, hence ['A', 'B'].

I assume groupby is the most convenient way, and I've used the following:

df['Unique_Info'] = df.groupby('Section').agg({'Info':'unique'})

Any ideas where I'm going wrong?

Upvotes: 1

Views: 40

Answers (2)

Mayank Porwal
Mayank Porwal

Reputation: 34086

Use df.merge and df.agg:

In [1531]: grp = df.groupby('Section')['Info'].agg(list).reset_index()
In [1535]: df.merge(grp, on='Section').rename(columns={'Info_y': 'unique'})
Out[1535]: 
  Info_x Section  unique
0      A       1  [A, B]
1      B       1  [A, B]
2      C       2  [C, D]
3      D       2  [C, D]
4      E       3     [E]

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150815

df.groupby().agg returns a series with different indexing, which is the Section number. You should use map to assign back to your dataframe:

s = df.groupby('Section')['Info'].agg('unique')
df['Unique_Info'] = df['Section'].map(s)

Output:

  Info Section Unique_Info
0    A       1      [A, B]
1    B       1      [A, B]
2    C       2      [C, D]
3    D       2      [C, D]
4    E       3         [E]

Upvotes: 2

Related Questions