deeth128
deeth128

Reputation: 25

How to split rows in a dataframe based on column value?

I am trying to split my rows into multiple rows based on if a value exists in a column. I am able to filter based on the column value but would like to split the row into two based on the condition and return only specific parts of the row. The dataframe consists of different types of payments and the activity field represents the payment type that occurred. I would like for when there is a specific payment type to split the row into two. In the example below, when payment 2 exists I want it to be a separate transaction.

I have the following dataframe:

ID      Payment 1      Payment 2      Payment 3    Payment 4    Payment 5      Activity 
1          10             0              0            0              0         Payment 1  
2          0              10             15           0              0         Payment 2, Payment 3
3          10             20             30           0              0         Payment 1, Payment 2, Payment 3

But I would like for the dataframe to be:

ID      Payment 1      Payment 2      Payment 3    Payment 4    Payment 5       Activity 
 1          10             0              0            0              0         Payment 1  
 2          0              10             0            0              0         Payment 2
 2          0              0              15           0              0         Payment 3
 3          10             0              30           0              0         Payment 1, Payment 3
 3          0              20             0            0              0         Payment 2

Upvotes: 0

Views: 1248

Answers (1)

Gerd
Gerd

Reputation: 2803

The problem here is that you want to change both the Payment columns and the Activity column at the same time. So actually, I only see the naive approach of iterating through the DataFrame and writing new rows to a new DataFrame depending on the condition row['Payment 2'] == 0 like this:

df_new = pd.DataFrame(columns=['ID', 'Payment 1', 'Payment 2', 'Payment 3', 'Payment 4', 'Payment 5', 'Activity'])
for index, row in df.iterrows():
  if row['Payment 2'] == 0:
    df_new = df_new.append(row)
  else:
    row_new = pd.DataFrame({'ID':[row['ID']], 'Payment 1':[row['Payment 1']], 'Payment 2':[0], 'Payment 3':[row['Payment 3']], 'Payment 4':[row['Payment 4']], 'Payment 5':[row['Payment 5']], 'Activity':[', '.join([x for x in ['Payment 1', 'Payment 3', 'Payment 4', 'Payment 5'] if row[x] != 0])]})
    df_new = df_new.append(row_new)
    row_new = pd.DataFrame({'ID':[row['ID']], 'Payment 1':[0], 'Payment 2':[row['Payment 2']], 'Payment 3':[0], 'Payment 4':[0], 'Payment 5':[0], 'Activity':'Payment 2'})
    df_new = df_new.append(row_new)

This will append two new rows (one with Payment 2 only and one with the rest) if there is a Payment 2 and append the original row from the source DataFrame if there is no Payment 2.

Related questions on splitting rows (based on one column only and leaving the other ones untouched):

Upvotes: 1

Related Questions