Reputation: 93
I have a dataset where due to OCR is a bit dirty. A sample looks like this:
date float_col1 float_col2 txt_col
0 12/17/2019 0.00 5012.41 some string1
1 12/26/2019 0.00 0.30 some string2
2 NaN 1.98 0.00 some string3
3 01/06/2020 0.00 673.01 some string4
4 NaN 0.00 0.00 some string5
5 NaN 786.60 0.00 some string6
I would like to combine rows 4 and 5 since they should belong together.
The condition would be if float_col1 AND float_col1 are both NA then they should be combined.
Dataset should look like this:
date float_col1 float_col2 txt_col
0 12/17/2019 0.00 5012.41 some string1
1 12/26/2019 0.00 0.30 some string2
2 NaN 1.98 0.00 some string3
3 01/06/2020 0.00 673.01 some string4
4 NaN 786.60 0.00 some string5 some string6
Thanks.
Upvotes: 0
Views: 240
Reputation: 323396
We need fillna
NaN first , since the np.nan
not equal to itself, then we just do gorupby
with agg
, with the new key create by shift
and cumsum
df.date.fillna('NaN',inplace=True)
fund={'date':'first','float_col1':'sum','float_col2':'sum','txt_col':','.join}
newdf=df.groupby(df.date.ne(df.date.shift()).cumsum()).agg(fund)
Out[28]:
date float_col1 float_col2 txt_col
date
1 12/17/2019 0.00 5012.41 somestring1
2 12/26/2019 0.00 0.30 somestring2
3 NaN 1.98 0.00 somestring3
4 01/06/2020 0.00 673.01 somestring4
5 NaN 786.60 0.00 somestring5,somestring6
Upvotes: 1