Reputation: 429
I am a beginner and would appreciate some help :) I want to reduce the amount of rows by combining rows based on a variable.
I have given data set:
d = {
'day': ['one', 'one', 'two', 'three', 'three'],
'switch': [0, 1, 0, 1, 0 ],
'value': ['green', 'red', 'blue', 'orange', 'purple']
}
df = pd.DataFrame(data=d)
print(df)
day switch value
0 one 0 green
1 one 1 red
2 two 0 blue
3 three 1 orange
4 three 0 purple
I try to restructure the data set to look like this:
day switch_0_value switch_1_value
0 one green red
1 two blue NAN
2 three purple orange
I have looked at 'combine_first' but don't know how to apply that within a loop.
Thank you so much in advance!
Upvotes: 1
Views: 1301
Reputation: 20669
You can use df.unstack
here.
df.set_index(['day','switch']).value.unstack(-1)
switch 0 1
day
one green red
three purple orange
two blue NaN
Or df.pivot
df.pivot(index='day',columns='switch',values='value')
switch 0 1
day
one green red
three purple orange
two blue NaN
or using df.groupby
with agg
and pd.NamedAgg
is just for adding custom named columns
df.groupby('day').agg(switch_0=pd.NamedAgg(column='value',aggfunc='first'),
switch_1=pd.NamedAgg(column='value',aggfunc=lambda x:x.iloc[-1]
if x.size>1 else np.nan))
switch_0 switch_1
day
one green red
three orange purple
two blue NaN
or using df.pivot_table
but pivot_table
's default aggfunc
is 'mean'
so we need to get little dirty here. ;)
df.pivot_table(index='day',columns='switch',values='value',aggfunc=lambda x:x)
switch 0 1
day
one green red
three purple orange
two blue NaN
Upvotes: 0
Reputation: 30991
To have the proper ordering of days, start from conversion of day column from plain string to Categorical:
days = ['one', 'two', 'three']
dayCat = pd.Categorical(days, categories=days, ordered=True)
df.day = df.day.astype(dayCat)
Then compute the result:
result = df.groupby('day').apply(lambda grp: grp.sort_values('switch')
.value.reset_index(drop=True)).unstack().add_prefix('switch_')\
.add_suffix('_value').reset_index()
The result is:
day switch_0_value switch_1_value
0 one green red
1 two blue NaN
2 three purple orange
Upvotes: 0
Reputation: 1058
Try something like this:
result = df[df["switch"] == 0][["day", "value"]].rename(columns = {"value":"switch_0_value"})
result = pd.merge(result, df[df["switch"] == 1][["day", "value"]], how='left').rename(columns = {"value":"switch_1_value"})
result
day switch_0_value switch_1_value
0 one green red
1 two blue NaN
2 three purple orange
Upvotes: 2