Reputation: 373
I have a dataframe:
import pandas as pd
df = pd.DataFrame({
"ID": ['company A', 'company A', 'company A', 'company B','company B', 'company B', 'company C', 'company C','company C','company C', 'company D', 'company D','company D'],
'Sender': [28, 'remove1', 'flag_source', 56, 28, 312, 'remove2', 'flag_source', 78, 102, 26, 101, 96],
'Receiver': [129, 28, 'remove1', 172, 56, 28, 61, 'remove2', 12, 78, 98, 26, 101],
'Date': ['2020-04-12', '2020-03-20', '2020-03-20', '2019-02-11', '2019-01-31', '2018-04-02', '2020-06-29', '2020-06-29', '2019-11-29', '2019-10-01', '2020-04-03', '2020-01-30', '2019-10-18'],
'Sender_type': ['house', 'temp', 'house', 'house', 'house', 'house', 'temp', 'house', 'house','house','house', 'temp', 'house'],
'Receiver_type': ['house', 'house', 'temp', 'house','house','house','house', 'temp', 'house','house','house','house','temp'],
'Price': [32, 50, 47, 21, 23, 19, 52, 39, 12, 22, 61, 53, 19]
})
The df is like this below:
ID Sender Receiver Date Sender_type Receiver_type Price
0 company A 28 129 2020-04-12 house house 32
1 company A remove1 28 2020-03-20 temp house 50 # combine this row with below
2 company A flag_source remove1 2020-03-20 house temp 47 # combine this row with above
3 company B 56 172 2019-02-11 house house 21
4 company B 28 56 2019-01-31 house house 23
5 company B 312 28 2018-04-02 house house 19
6 company C remove2 61 2020-06-29 temp house 52 # combine this row and below
7 company C flag_source remove2 2020-06-29 house temp 39 # combine this row with above
8 company C 78 12 2019-11-29 house house 12
9 company C 102 78 2019-10-01 house house 22
10 company D 26 98 2020-04-03 house house 61
11 company D 101 26 2020-01-30 temp house 53
12 company D 96 101 2019-10-18 house temp 19
I wish to combine/merge two rows for each group 'ID' (company x) by the following rule: combine the row in 'Sender' that contains a'flag_source' and its above row into one new row. In this new row: the Sender is the flag_source, 'Revceiver' is its above value (remove the two 'remove' values), Date is the above date, Sender_type and Receiver_type are 'house', and 'Price' is the previous above value. Then remove the two rows. For example, for company A, it will combine line 1 and line 2 to generate the new row below:
ID Sender Receiver Date Sender_type Receiver_type Price
company A flag_source 28 2020-03-20 house house 50
Then use this new row to replace the previous two lines. Same rules for the other groups(in this case only apply to company A and C). In the end, I wish to have a result like this:
ID Sender Receiver Date Sender_type Receiver_type Price
0 company A 28 129 2020-04-12 house house 32
1 company A flag_source 28 2020-03-20 house house 50 # new row
2 company B 56 172 2019-02-11 house house 21
3 company B 28 56 2019-01-31 house house 23
4 company B 312 28 2018-04-02 house house 19
5 company C flag_source 61 2020-06-29 house house 52 # new row
6 company C 78 12 2019-11-29 house house 12
7 company C 102 78 2019-10-01 house house 22
8 company D 26 98 2020-04-03 house house 61
9 company D 101 26 2020-01-30 temp house 53
10 company D 96 101 2019-10-18 house temp 19
Hopefully my explanation for the question is clear.
As this is a brief sample, the real case has many data like this, I wrote a loop but very slow and unproductive, so please help if you have any ideas and effective way. Many many thanks for help!
Upvotes: 2
Views: 72
Reputation: 8800
I believe the following is working:
mask = df.Sender == 'flag_source'
df[mask] = df.shift()
df.loc[mask, 'Sender'] = 'flag_source'
df.loc[mask, ['Sender_type','Receiver_type']] = 'house'
df = df[~mask.shift(-1).fillna(False).astype(bool)].reset_index(drop=True)
So the steps are (by line):
Sender
for those rows to flag_source
Sender_type
and Receiver_type
shift
again on the mask. This seems a little convoluted; you could also do something like a loc
against rows that don't contain the string remove
Output:
ID Sender Receiver Date Sender_type Receiver_type Price
0 company A 28 129 2020-04-12 house house 32.0
1 company A flag_source 28 2020-03-20 house house 50.0
2 company B 56 172 2019-02-11 house house 21.0
3 company B 28 56 2019-01-31 house house 23.0
4 company B 312 28 2018-04-02 house house 19.0
5 company C flag_source 61 2020-06-29 house house 52.0
6 company C 78 12 2019-11-29 house house 12.0
7 company C 102 78 2019-10-01 house house 22.0
8 company D 26 98 2020-04-03 house house 61.0
9 company D 101 26 2020-01-30 temp house 53.0
10 company D 96 101 2019-10-18 house temp 19.0
Upvotes: 1