MTALY
MTALY

Reputation: 1772

Convert rows with values into multiple columns after grouping

There are many steps to do that for the data I have, I will show you the steps taken so far where I stuck:

I have this df:

df = pd.DataFrame(np.array([['Iza', '2020-12-01 10:34:00'],['Iza', '2020-12-02 10:34:00'],['Iza', '2020-12-01 17:34:00'],['Iza', '2020-12-01 17:34:00'],['Sara', '2020-12-04 17:34:00'], ['Sara', '2020-12-04 20:11:00'], ['Sara', '2020-12-06 17:34:00'],['Silvia', '2020-12-07 18:34:00'],['Silvia', '2020-12-09 11:22:00'],['Paul', '2020-12-09 11:22:00'],['Paul', '2020-12-08 11:22:00'],['Paul', '2020-12-07 11:22:00']]),
                   columns=['Name', 'Time'])

df:

    Name    Time
0   Iza     2020-12-01 10:34:00
1   Iza     2020-12-02 10:34:00
2   Iza     2020-12-01 17:34:00
3   Iza     2020-12-01 17:34:00
4   Sara    2020-12-04 17:34:00
5   Sara    2020-12-04 20:11:00
6   Sara    2020-12-06 17:34:00
7   Silvia  2020-12-07 18:34:00
8   Silvia  2020-12-09 11:22:00
9   Paul    2020-12-09 11:22:00
10  Paul    2020-12-08 11:22:00
11  Paul    2020-12-07 11:22:00

I converted the time column to datetime:

df['Time'] = pd.to_datetime(df['Time'])

Now I want to get days in names and find the percentage of each day per name in columns:

df['Day'] = df['Time'].dt.day_name()

Result:

    Name    Time                 Day
0   Iza     2020-12-01 10:34:00 Tuesday
1   Iza     2020-12-02 10:34:00 Wednesday
2   Iza     2020-12-01 17:34:00 Tuesday
3   Iza     2020-12-01 17:34:00 Tuesday
4   Sara    2020-12-04 17:34:00 Friday
5   Sara    2020-12-04 20:11:00 Friday
6   Sara    2020-12-06 17:34:00 Sunday
7   Silvia  2020-12-07 18:34:00 Monday
8   Silvia  2020-12-09 11:22:00 Wednesday
9   Paul    2020-12-09 11:22:00 Wednesday
10  Paul    2020-12-08 11:22:00 Tuesday
11  Paul    2020-12-07 11:22:00 Monday

df2 = round(df.groupby(['Name'])['Day'].apply(lambda x: x.value_counts(normalize=True)) * 100)

Result:

Name             
Iza     Tuesday      75.0
        Wednesday    25.0
Paul    Monday       33.0
        Tuesday      33.0
        Wednesday    33.0
Sara    Friday       67.0
        Sunday       33.0
Silvia  Wednesday    50.0
        Monday       50.0
Name: Day, dtype: float64

I stuck here, my desired output - days in columns with % for each per name:

Name Sunday Monday Tuesday Wednesday Friday
Iza    NaN    NaN     75     25        NaN
Paul   NaN     33     33     33        NaN
Sara   33     NaN    NaN    NaN        67
Silvia NaN    50     NaN    50         NaN    

Upvotes: 1

Views: 52

Answers (2)

jezrael
jezrael

Reputation: 862511

Use Categorical for correct order in last Series.unstack, solution was simplify without apply:

df['Time'] = pd.to_datetime(df['Time'])

week   = ['Sunday', 
          'Monday', 
          'Tuesday', 
          'Wednesday', 
           'Thursday',  
           'Friday', 
           'Saturday']

df['Day'] = pd.Categorical(df['Time'].dt.day_name(), ordered=True, categories=week)

df1 = df.groupby('Name')['Day'].value_counts(normalize=True).unstack().mul(100).round()
print (df1)
Day     Sunday  Monday  Tuesday  Wednesday  Friday
Name                                              
Iza        NaN     NaN     75.0       25.0     NaN
Paul       NaN    33.0     33.0       33.0     NaN
Sara      33.0     NaN      NaN        NaN    67.0
Silvia     NaN    50.0      NaN       50.0     NaN

For correct ordering is a bit changed solution:

df['Time'] = pd.to_datetime(df['Time'])
df['Day'] = df['Time'].dt.dayofweek

d =  {0: 'Sunday', 1: 'Monday', 2: 'Tuesday', 3: 'Wednesday',
      4: 'Thursday',  5: 'Friday', 6: 'Saturday'}

df1 = df.groupby('Name')['Day'].value_counts(normalize=True).unstack().mul(100).round().rename(columns=d)
print (df1)
Day     Sunday  Monday  Tuesday  Thursday  Saturday
Name                                               
Iza        NaN    75.0     25.0       NaN       NaN
Paul      33.0    33.0     33.0       NaN       NaN
Sara       NaN     NaN      NaN      67.0      33.0
Silvia    50.0     NaN     50.0       NaN       NaN

Upvotes: 1

wwnde
wwnde

Reputation: 26676

Just unstack. You are on the money

df2 = round(df.groupby(['Name'])['Day'].apply(lambda x: x.value_counts(normalize=True)) * 100).unstack(level=1)

df2=df2[['Sunday','Monday','Tuesday', 'Wednesday','Friday']]



        Sunday  Monday  Tuesday  Wednesday  Friday
Name                                              
Iza        NaN     NaN     75.0       25.0     NaN
Paul       NaN    33.0     33.0       33.0     NaN
Sara      33.0     NaN      NaN        NaN    67.0
Silvia     NaN    50.0      NaN       50.0     NaN

Upvotes: 1

Related Questions