Reputation: 11
I would like to modify the raw data in df1
to the form of df2
import pandas as pd
df1=pd.DataFrame([["20180105","abcdefg"],["","sdasdas"],["20180211","asdasfsd"],["","asdfg"],["","sdada"]],columns=["A","B"])
df2=pd.DataFrame([["20180105","abcdefgsdasdas"],["20180211","asdasfsdasdfgsdada"]],columns=["A","B"])
Upvotes: 0
Views: 51
Reputation: 51335
You can groupby
, and use sum
for string concatenation:
df1.replace({'A':{'':np.nan}}).ffill().groupby('A', as_index=False).sum()
A B
0 20180105 abcdefgsdasdas
1 20180211 asdasfsdasdfgsdada
Note I got rid of your blank strings in column A
by replacing with NaN
and then forward filling with ffill()
Upvotes: 2
Reputation: 59264
Can also use agg
+ ''.join
g = (df1.A != '').cumsum()
df1.groupby(g, as_index=False).agg(''.join)
A B
0 20180105 abcdefgsdasdas
1 20180211 asdasfsdasdfgsdada
Upvotes: 2