Mi.
Mi.

Reputation: 510

Aggregating string columns using pandas GroupBy

I have a DF such as the following:

df =

vid   pos      value       sente
1     a         A           21
2     b         B           21
3     b         A           21
3     a         A           21
1     d         B           22
1     a         C           22
1     a         D           22
2     b         A           22
3     a         A           22

Now I want to combine all rows with the same value for sente and vid into one row with the values for value joined by an " "

df2 =

vid   pos      value       sente
1     a         A           21
2     b         B           21
3     b a       A A         21
1     d a a     B C D       22
2     b         A           22
3     a         A           22

I suppose a modification of this should do the trick:

df2 = df.groupby["sente"].agg(lambda x: " ".join(x))

But I can't seem to figure out how to add the second column to the statement.

Upvotes: 5

Views: 12666

Answers (2)

piRSquared
piRSquared

Reputation: 294516

As of this edit, @cᴏʟᴅsᴘᴇᴇᴅ's answer is way better.

Fun Way! Only works because single char values

df.set_index(['sente', 'vid']).sum(level=[0, 1]).applymap(' '.join).reset_index()


   sente  vid    pos  value
0     21    1      a      A
1     21    2      b      B
2     21    3    b a    A A
3     22    1  d a a  B C D
4     22    2      b      A
5     22    3      a      A

somewhat ok answer

df.set_index(['sente', 'vid']).groupby(level=[0, 1]).apply(
    lambda d: pd.Series(d.to_dict('l')).str.join(' ')
).reset_index()

   sente  vid    pos  value
0     21    1      a      A
1     21    2      b      B
2     21    3    b a    A A
3     22    1  d a a  B C D
4     22    2      b      A
5     22    3      a      A

not recommended

df.set_index(['sente', 'vid']).add(' ') \
  .sum(level=[0, 1]).applymap(str.strip).reset_index()

   sente  vid    pos  value
0     21    1      a      A
1     21    2      b      B
2     21    3    b a    A A
3     22    1  d a a  B C D
4     22    2      b      A
5     22    3      a      A

Upvotes: 1

cs95
cs95

Reputation: 403130

Groupers can be passed as lists. Furthermore, you can simplify your solution a bit by ridding your code of the lambda—it isn't needed.

df.groupby(['vid', 'sente'], as_index=False, sort=False).agg(' '.join)

   vid  sente    pos  value
0    1     21      a      A
1    2     21      b      B
2    3     21    b a    A A
3    1     22  d a a  B C D
4    2     22      b      A
5    3     22      a      A

Some other notes: specifying as_index=False means your groupers will be present as columns in the result (and not as the index, as is the default). Furthermore, sort=False will preserve the original order of the columns.

Upvotes: 9

Related Questions