merge rows based on index range pandas

Question

Using tabula I have a dataframe of transactions from a pdf, which has the following layout:

|   | Details     | Transaction |
 --- ------------- -------------
| 0 | Date: 11:20 | Debit       |
| 1 | 01/01/2020  | $100.50     |
| 2 | Date: 13:15 | Credit      |
| 3 | 01/02/2020  | $50.00      | 
 --- ------------- -------------

I would like merge rows that are from the same transaction. In this case rows [0, 1] and [2, 3].

My idea was to find each row number with the string 'Date' and merge those rows.

index = list(df.index[df['Details'].str.contains('Date')]
>>> [0, 2]

I would expect the output to be:

|   | Details     | Transaction |
 --- ------------- -------------
| 0 | Date: 11:20 | Debit       |
|   | 01/01/2020  | $100.50     |
| 1 | Date: 13:15 | Credit      |
|   | 01/02/2020  | $50.00      | 
 --- ------------- -------------

Shubham Sharma · Accepted Answer

Use Series.str.contains + Series.cumsum to create a grouper g, then use DataFrame.groupby to group the dataframe on g and agg using join:

g = df['Details'].str.contains('(?i)Date:').cumsum()
df = df.groupby(g, as_index=False).agg(' '.join)

Another idea (If the dataframe structure is fixed):

df = df.groupby(df.index // 2).agg(' '.join)

Result:

                  Details    Transaction
0  Date: 11:20 01/01/2020  Debit $100.50
1  Date: 13:15 01/02/2020  Credit $50.00

merge rows based on index range pandas

Answers (1)

Related Questions