Python df groupby with agg for string and sum

Question

With this df as base i want the following output:

So all should be aggregated by column 0 and all strings from column 1 should be added and the numbers from column 2 should be summed when the strings from column 1 have the same name.

With the following code i could aggregate the strings but without summing the numbers:

df2= df1.groupby([0]).agg(lambda x: ','.join(set(x))).reset_index()
df2

jpp · Accepted Answer

Avoid an arbitrary number of columns

Your desired output suggests you have an arbitrary number of columns dependent on the number of values in 1 for each group 0. This is anti-Pandas, which is strongly geared towards an arbitrary number of rows. Hence series-wise operations are preferred.

So you can just use groupby + sum to store all the information you require.

df = pd.DataFrame({0: ['2008-04_E.pdf']*3,
                   1: ['Mat1', 'Mat2', 'Mat2'],
                   2: [3, 1, 1]})

df_sum = df.groupby([0, 1]).sum().reset_index()

print(df_sum)

               0     1  2
0  2008-04_E.pdf  Mat1  3
1  2008-04_E.pdf  Mat2  2

But if you insist...

If you insist on your unusual requirement, you can achieve it as follows via df_sum calculated as above.

key = df_sum.groupby(0)[1].cumcount().add(1).map('Key{}'.format)
res = df_sum.set_index([0, key]).unstack().reset_index().drop('key', axis=1)

res.columns = res.columns.droplevel(0)

print(res)

                  Key1  Key2  Key1  Key2
0  2008-04_E.pdf  Mat1  Mat2     3     2

Python df groupby with agg for string and sum

Answers (2)

Avoid an arbitrary number of columns

But if you insist...

Related Questions