XaviorL
XaviorL

Reputation: 373

Pandas How to combine two rows in group with complex rules/conditions

I have a dataframe:

import pandas as pd

df = pd.DataFrame({
    "ID": ['company A', 'company A', 'company A', 'company B','company B', 'company B', 'company C', 'company C','company C','company C', 'company D', 'company D','company D'],
    'Sender': [28, 'remove1', 'flag_source', 56, 28, 312, 'remove2', 'flag_source', 78, 102, 26, 101, 96],
    'Receiver': [129, 28, 'remove1', 172, 56, 28, 61, 'remove2', 12, 78, 98, 26, 101],
    'Date': ['2020-04-12', '2020-03-20', '2020-03-20', '2019-02-11', '2019-01-31', '2018-04-02', '2020-06-29', '2020-06-29', '2019-11-29', '2019-10-01', '2020-04-03', '2020-01-30', '2019-10-18'],
    'Sender_type': ['house', 'temp', 'house', 'house', 'house', 'house', 'temp', 'house', 'house','house','house', 'temp', 'house'],
    'Receiver_type': ['house', 'house', 'temp', 'house','house','house','house', 'temp', 'house','house','house','house','temp'],
    'Price': [32, 50, 47, 21, 23, 19, 52, 39, 12, 22, 61, 53, 19]
})

The df is like this below:

           ID       Sender Receiver        Date Sender_type Receiver_type  Price  
0   company A           28      129  2020-04-12       house         house  32 
1   company A      remove1       28  2020-03-20        temp         house  50 # combine this row with below
2   company A  flag_source  remove1  2020-03-20       house          temp  47 # combine this row with above
3   company B           56      172  2019-02-11       house         house  21 
4   company B           28       56  2019-01-31       house         house  23 
5   company B          312       28  2018-04-02       house         house  19 
6   company C      remove2       61  2020-06-29        temp         house  52 # combine this row and below
7   company C  flag_source  remove2  2020-06-29       house          temp  39 # combine this row with above
8   company C           78       12  2019-11-29       house         house  12 
9   company C          102       78  2019-10-01       house         house  22 
10  company D           26       98  2020-04-03       house         house  61 
11  company D          101       26  2020-01-30        temp         house  53 
12  company D           96      101  2019-10-18       house          temp  19 

I wish to combine/merge two rows for each group 'ID' (company x) by the following rule: combine the row in 'Sender' that contains a'flag_source' and its above row into one new row. In this new row: the Sender is the flag_source, 'Revceiver' is its above value (remove the two 'remove' values), Date is the above date, Sender_type and Receiver_type are 'house', and 'Price' is the previous above value. Then remove the two rows. For example, for company A, it will combine line 1 and line 2 to generate the new row below:

ID        Sender        Receiver  Date        Sender_type  Receiver_type  Price
company A flag_source   28        2020-03-20  house        house          50

Then use this new row to replace the previous two lines. Same rules for the other groups(in this case only apply to company A and C). In the end, I wish to have a result like this:

           ID       Sender  Receiver        Date Sender_type Receiver_type  Price
0   company A           28       129  2020-04-12       house         house   32
1   company A  flag_source        28  2020-03-20       house         house   50 # new row
2   company B           56       172  2019-02-11       house         house   21
3   company B           28        56  2019-01-31       house         house   23
4   company B          312        28  2018-04-02       house         house   19
5   company C  flag_source        61  2020-06-29       house         house   52 # new row
6   company C           78        12  2019-11-29       house         house   12
7   company C          102        78  2019-10-01       house         house   22
8   company D           26        98  2020-04-03       house         house   61
9   company D          101        26  2020-01-30        temp         house   53
10  company D           96       101  2019-10-18       house          temp   19

Hopefully my explanation for the question is clear.

As this is a brief sample, the real case has many data like this, I wrote a loop but very slow and unproductive, so please help if you have any ideas and effective way. Many many thanks for help!

Upvotes: 2

Views: 72

Answers (1)

Tom
Tom

Reputation: 8800

I believe the following is working:

mask = df.Sender == 'flag_source'
df[mask] = df.shift()
df.loc[mask, 'Sender'] = 'flag_source'
df.loc[mask, ['Sender_type','Receiver_type']] = 'house'
df = df[~mask.shift(-1).fillna(False).astype(bool)].reset_index(drop=True)

So the steps are (by line):

  • make a mask of the rows you need to channge
  • set those rows equal to the previous row with 'shift'
  • rewrite Sender for those rows to flag_source
  • also rewrite Sender_type and Receiver_type
  • remove the previous rows, by using a shift again on the mask. This seems a little convoluted; you could also do something like a loc against rows that don't contain the string remove

Output:

          ID       Sender Receiver        Date Sender_type Receiver_type  Price
0   company A           28      129  2020-04-12       house         house   32.0
1   company A  flag_source       28  2020-03-20       house         house   50.0
2   company B           56      172  2019-02-11       house         house   21.0
3   company B           28       56  2019-01-31       house         house   23.0
4   company B          312       28  2018-04-02       house         house   19.0
5   company C  flag_source       61  2020-06-29       house         house   52.0
6   company C           78       12  2019-11-29       house         house   12.0
7   company C          102       78  2019-10-01       house         house   22.0
8   company D           26       98  2020-04-03       house         house   61.0
9   company D          101       26  2020-01-30        temp         house   53.0
10  company D           96      101  2019-10-18       house          temp   19.0

Upvotes: 1

Related Questions