gio888
gio888

Reputation: 93

How to combine consecutive rows in pandas with certain conditions

I have a dataset where due to OCR is a bit dirty. A sample looks like this:

    date    float_col1  float_col2  txt_col
0   12/17/2019  0.00    5012.41 some string1
1   12/26/2019  0.00    0.30    some string2
2   NaN         1.98    0.00    some string3
3   01/06/2020  0.00    673.01  some string4
4   NaN         0.00    0.00    some string5
5   NaN       786.60    0.00    some string6

I would like to combine rows 4 and 5 since they should belong together.

The condition would be if float_col1 AND float_col1 are both NA then they should be combined.

Dataset should look like this:

    date    float_col1  float_col2  txt_col
0   12/17/2019  0.00    5012.41 some string1
1   12/26/2019  0.00    0.30    some string2
2   NaN         1.98    0.00    some string3
3   01/06/2020  0.00    673.01  some string4
4   NaN       786.60    0.00    some string5 some string6

Thanks.

Upvotes: 0

Views: 240

Answers (1)

BENY
BENY

Reputation: 323396

We need fillna NaN first , since the np.nan not equal to itself, then we just do gorupby with agg, with the new key create by shift and cumsum

df.date.fillna('NaN',inplace=True)
fund={'date':'first','float_col1':'sum','float_col2':'sum','txt_col':','.join}
newdf=df.groupby(df.date.ne(df.date.shift()).cumsum()).agg(fund)
Out[28]: 
            date  float_col1  float_col2                  txt_col
date                                                             
1     12/17/2019        0.00     5012.41              somestring1
2     12/26/2019        0.00        0.30              somestring2
3            NaN        1.98        0.00              somestring3
4     01/06/2020        0.00      673.01              somestring4
5            NaN      786.60        0.00  somestring5,somestring6

Upvotes: 1

Related Questions