Reputation: 69
I have a table which looks like this -
date name values items
01-03-2019 xyz 900 NaN
NaN NaN 900 brush
02-03-2019 abc 1200 NaN
NaN NaN 900 paste
NaN NaN 300 floss
and desired output is -
date name values items
01-03-2019 xyz 900 brush
02-03-2019 abc 1200 paste, floss
I know I can use np.where() to make the values of each item NaN, but I don't know how I should go about the concatenation and then move the result up alongside the name.
Upvotes: 2
Views: 752
Reputation: 42916
We can use fillna
with forwardfill (ffill)
. Then dropna
and at the same time groupby
with agg
to get the sum
of value and concat the strings in items:
df[['date', 'name']] = df[['date', 'name']].fillna(method='ffill')
df = df.dropna().groupby('date').agg({'name':'first',
'values':'sum',
'items':', '.join}).reset_index()
print(df)
date name values items
0 01-03-2019 xyz 900 brush
1 02-03-2019 abc 1200 paste, floss
Upvotes: 1
Reputation: 863216
Use GroupBy.agg
with first column and ffill
for forward filling missing values and dictionary, which is created dynamic - for each column without date
and items
is applied aggregate function GroupBy.first
and for last column items
is used join
with removed missing values:
d = dict.fromkeys(df.columns.difference(['date','items']), 'first')
d['items'] = lambda x: ', '.join(x.dropna())
df = df.groupby(df['date'].ffill()).agg(d).reset_index()
print (df)
date name values items
0 01-03-2019 xyz 900 brush
1 02-03-2019 abc 1200 paste, floss
If only few columns is possible pass 2 columns to groupby
with ffill
and create dictionary passed to agg
:
df = (df.groupby([df['date'].ffill(), df['name'].ffill()])
.agg({'values':'first', 'items':lambda x: ', '.join(x.dropna())})
.reset_index())
print (df)
date name values items
0 01-03-2019 xyz 900 brush
1 02-03-2019 abc 1200 paste, floss
Upvotes: 2