user3556887
user3556887

Reputation: 11

Modify a DataFrame in Python

I would like to modify the raw data in df1 to the form of df2

import pandas as pd

df1=pd.DataFrame([["20180105","abcdefg"],["","sdasdas"],["20180211","asdasfsd"],["","asdfg"],["","sdada"]],columns=["A","B"])

df2=pd.DataFrame([["20180105","abcdefgsdasdas"],["20180211","asdasfsdasdfgsdada"]],columns=["A","B"])

enter image description here

Upvotes: 0

Views: 51

Answers (2)

sacuL
sacuL

Reputation: 51335

You can groupby, and use sum for string concatenation:

df1.replace({'A':{'':np.nan}}).ffill().groupby('A', as_index=False).sum() 

          A                   B
0  20180105      abcdefgsdasdas
1  20180211  asdasfsdasdfgsdada

Note I got rid of your blank strings in column A by replacing with NaN and then forward filling with ffill()

Upvotes: 2

rafaelc
rafaelc

Reputation: 59264

Can also use agg + ''.join

g = (df1.A != '').cumsum()
df1.groupby(g, as_index=False).agg(''.join)

    A           B 
0   20180105    abcdefgsdasdas
1   20180211    asdasfsdasdfgsdada

Upvotes: 2

Related Questions