how to groupby specific dataframe columns

Question

I have the following dataframe:

    name    style val1 val2 val3
23  sher      D     2    5    6
56  sher      C     3    2    4
34  David     A     1    1    1
47  iamgo     B     4    4    3
77  para      A     6    4    2
120 moli      A     7    2    5
86  para      A     5    4    1

I want to create new dataframe using groupby "name" that will return the following:

          style val1 val2 val3
  name
  sher      D,C   3    5    4
  David     A     1    1    1
  iamgo     B     4    4    3
  para      A     6    4    2
  moli      A     7    2    5

for "style" I want to add the value in case its not the same value (like with "para"), for "val1" and "val2" the maximum value, for "val3" the minimum value, and reset the indexes. here's my code:

df.groupby('name').agg({
    'style': sum,
    'val1': max,
    'val2': max,
    'val3': min
    })

output:

          style val1 val2 val3
  name
  sher      DC   3    5    4
  David     A    1    1    1
  iamgo     B    4    4    3
  para      AA   6    4    2
  moli      A    7    2    5

what I missing here?

Thanks,

jezrael · Accepted Answer

Use join function instead sum:

df1 = df.groupby('name').agg({
      'style': ','.join,
      'val1': max,
      'val2': max,
      'val3': min
})

print (df1)
      style  val1  val2  val3
name                         
David     A     1     1     1
iamgo     B     4     4     3
moli      A     7     2     5
para    A,A     6     4     1
sher    D,C     3     5     4

If need unique values convert values to sets:

df2 = df.groupby('name').agg({
      'style': lambda x: ','.join(set(x)),
      'val1': max,
      'val2': max,
      'val3': min
})

print (df2)
      style  val1  val2  val3
name                         
David     A     1     1     1
iamgo     B     4     4     3
moli      A     7     2     5
para      A     6     4     1
sher    D,C     3     5     4

how to groupby specific dataframe columns

Answers (1)

Related Questions