Dana_Miles
Dana_Miles

Reputation: 429

How to reduce/combine rows in a dataframe based on specific values?

I am a beginner and would appreciate some help :) I want to reduce the amount of rows by combining rows based on a variable.

I have given data set:

d = {
   'day': ['one', 'one', 'two', 'three', 'three'], 
   'switch': [0, 1, 0, 1, 0 ],
   'value': ['green', 'red', 'blue', 'orange', 'purple']
}
df = pd.DataFrame(data=d)
print(df)

    day     switch  value
0   one     0       green
1   one     1       red
2   two     0       blue
3   three   1       orange
4   three   0       purple

I try to restructure the data set to look like this:

    day      switch_0_value    switch_1_value
0   one      green             red
1   two      blue              NAN
2   three    purple            orange

I have looked at 'combine_first' but don't know how to apply that within a loop.

Thank you so much in advance!

Upvotes: 1

Views: 1301

Answers (3)

Ch3steR
Ch3steR

Reputation: 20669

You can use df.unstack here.

df.set_index(['day','switch']).value.unstack(-1)

switch       0       1
day
one      green     red
three   purple  orange
two       blue     NaN

Or df.pivot

df.pivot(index='day',columns='switch',values='value')

switch       0       1
day
one      green     red
three   purple  orange
two       blue     NaN

or using df.groupby with agg and pd.NamedAgg is just for adding custom named columns

df.groupby('day').agg(switch_0=pd.NamedAgg(column='value',aggfunc='first'),
                      switch_1=pd.NamedAgg(column='value',aggfunc=lambda x:x.iloc[-1] 
                      if x.size>1 else np.nan))

      switch_0 switch_1
day
one      green      red
three   orange   purple
two       blue      NaN

or using df.pivot_table but pivot_table's default aggfunc is 'mean' so we need to get little dirty here. ;)

df.pivot_table(index='day',columns='switch',values='value',aggfunc=lambda x:x)
switch       0       1
day
one      green     red
three   purple  orange
two       blue     NaN

Upvotes: 0

Valdi_Bo
Valdi_Bo

Reputation: 30991

To have the proper ordering of days, start from conversion of day column from plain string to Categorical:

days = ['one', 'two', 'three']
dayCat = pd.Categorical(days, categories=days, ordered=True)
df.day = df.day.astype(dayCat)

Then compute the result:

result = df.groupby('day').apply(lambda grp: grp.sort_values('switch')
    .value.reset_index(drop=True)).unstack().add_prefix('switch_')\
    .add_suffix('_value').reset_index()

The result is:

     day switch_0_value switch_1_value
0    one          green            red
1    two           blue            NaN
2  three         purple         orange

Upvotes: 0

Let's try
Let's try

Reputation: 1058

Try something like this:

result = df[df["switch"] == 0][["day", "value"]].rename(columns = {"value":"switch_0_value"})
result = pd.merge(result, df[df["switch"] == 1][["day", "value"]], how='left').rename(columns = {"value":"switch_1_value"})
result
    day    switch_0_value   switch_1_value
0   one    green            red
1   two    blue             NaN
2   three  purple           orange

Upvotes: 2

Related Questions