tktktk0711
tktktk0711

Reputation: 1694

pandas dataframe: sum multi-columns value based on common columns

There is a dataframe like as following:

 date     id  device    t1    t2       text             y1   y2 
 2010-1-1  1    pc      yes1  I am1     This is a test1    5    3
 2010-1-1  1    smart   yes1  I am1   This is a test1   6    4
 2010-1-1  1    table    yes1  I am1   This is a test1    7    5
 2010-1-1  2    pc       yes2  I am1   This is a test2      8    2
 2010-1-1  2    smart     yes2  I am1   This is a test2   8    3
 2010-1-1  2    table     yes2  I am1   This is a test2   9    4
 2010-1-1  3    pc        yes3  I am3      This is a test3   10   3
 2010-1-1  3    smart     yes3  I am3   This is a tes3    11   2
 ........................

Now I want to merge a new dataframe: (1). sum the y1, and y2, when the id and date, t1, t2, text are same. (2). join the device str when when the id and date, t1, t2, text are same. (3). merge the common rows(which have same id, date, text, t1, t2) as one row,
and the new dataframe like as following:

 date   id  device  t1    t2            text                    y1   y2 
2010-1-1 1  pc,smart,table     yes1  I am1      This is a test1  18  12
2010-1-1  2   pc,smart,table      yes2  I am2      This is a test2 25  9
2010-1-1  3    pc,smart     yes3  I am3      This is a test3    21   5

Upvotes: 1

Views: 170

Answers (2)

jezrael
jezrael

Reputation: 862731

Use groupby by all columns with same values per groups and aggregate by agg with dictionary, last add reindex for same ordering of final columns:

df = (df.groupby(['date','id', 't1', 't2', 'text'], as_index=False)
        .agg({'y1':'sum', 'y2':'sum', 'device': ', '.join})
        .reindex(columns=df.columns))
print (df)
       date  id            device    t1     t2             text  y1  y2
0  2010-1-1   1  pc, smart, table  yes1  I am1  This is a test1  18  12
1  2010-1-1   2  pc, smart, table  yes2  I am1  This is a test2  25   9
2  2010-1-1   3         pc, smart  yes3  I am3  This is a test3  21   5

Upvotes: 1

Zero
Zero

Reputation: 76927

Use

In [294]: (df.groupby(['date', 'id', 't1', 't2', 'text'], as_index=False)
             .agg({'device': ','.join, 'y1': sum, 'y2': sum}))
Out[294]:
       date  id    t1     t2             text          device  y1  y2
0  2010-1-1   1  yes1  I am1  This is a test1  pc,smart,table  18  12
1  2010-1-1   2  yes2  I am1  This is a test2  pc,smart,table  25   9
2  2010-1-1   3  yes3  I am3  This is a test3        pc,smart  21   5

Upvotes: 1

Related Questions