Reputation: 452
If I have the following dataframe:
Index Col1 Col2 Col3
1 10 x 40
2 y 50
3 z 60
4 20 a 30
I would like to merge rows that have a blank Col1 with the previous row that is not blank in Col1.
Expected output:
Index Col1 Col2 Col3
1 10 x,y,z 40,50,60
4 20 a 30
Is this possible?
Thanks
Upvotes: 2
Views: 1591
Reputation: 452
The answers posted answered my 'dumbed down' dataset question but I could not get them to work in my situation with my real world dataset. I had posted another question prior to this one concerning fixing my problem while pulling the data in not manipulating the data once in and an answer was formed from that answering this question as well.
It is here
The answer is:
last_valid = None
check_cols = ['Col1'] # if only need to check a subset of cols for validity, do it here
df = df.astype(str) #convert all columns to strings as I have to combine numbers in the same cell
df = df.replace('nan','') #get rid of the nan created back to a blank string
for i, s in df.iterrows(): # This is slow, but probably necessary in this case
""" If all the rows are valid, we want to keep it as a reference in case
the following rows are not """
if all(s[check_cols] != ''):
lvi, last_valid = i, s
# need to store index and series so we can go back and replace it
continue
else: # here is the critical part
extra_vals = s[s != ''] # find cells in row that have actual values
for col in extra_vals.index:
""" I'm creating a list and appending here since I don't know
your values or how they need to be handled exactly"""
last_valid[col] = last_valid[col] + "," + extra_vals[col] #separate by whatever you wish, list was causing issues
# replace that row in the dataframe
df.iloc[lvi, :] = last_valid
# drop extra rows:
df = df[df['Col1'] != ''].reset_index(drop=True)
Upvotes: 0
Reputation: 323396
We can do
out = df.drop(labels = 'Col1',axis = 1).astype(str).groupby(df['Col1'].mask(df['Col1']=='').ffill()).agg(','.join).reset_index()
Out[85]:
Col1 Col2 Col3
0 10.0 x,y,z 40,50,60
1 20.0 a 30
Upvotes: 2
Reputation: 23099
Quite possible, what you need to do is create a unique group value that increments at every value that is not null.
in one go
df.drop('Col1',axis=1).groupby((df['Col1'].isna()==False).cumsum()).agg(list)
#p.s if really want strings use
#df.drop('Col1',axis=1).groupby((df['Col1'].isna()==False
# ).cumsum()).astype(str).agg(','.join)
Col2 Col3
Col1
1 [x, y, z] [40, 50, 60]
2 [a] [30]
the key
here is the condition:
df[['Col1']].assign(con=df['Col1'].isna()==False)
Col1 con #for condition
0 10.0 True <-- first group
1 NaN False
2 NaN False
3 20.0 True <-- second group
now, creating a cumulative sum allows you to create your grouper object.
df[['Col1']].assign(con=(df['Col1'].isna()==False).cumsum())
Col1 con
0 10.0 1
1 NaN 1
2 NaN 1
3 20.0 2
Upvotes: 2