Reputation: 1772
There are many steps to do that for the data I have, I will show you the steps taken so far where I stuck:
I have this df:
df = pd.DataFrame(np.array([['Iza', '2020-12-01 10:34:00'],['Iza', '2020-12-02 10:34:00'],['Iza', '2020-12-01 17:34:00'],['Iza', '2020-12-01 17:34:00'],['Sara', '2020-12-04 17:34:00'], ['Sara', '2020-12-04 20:11:00'], ['Sara', '2020-12-06 17:34:00'],['Silvia', '2020-12-07 18:34:00'],['Silvia', '2020-12-09 11:22:00'],['Paul', '2020-12-09 11:22:00'],['Paul', '2020-12-08 11:22:00'],['Paul', '2020-12-07 11:22:00']]),
columns=['Name', 'Time'])
df:
Name Time
0 Iza 2020-12-01 10:34:00
1 Iza 2020-12-02 10:34:00
2 Iza 2020-12-01 17:34:00
3 Iza 2020-12-01 17:34:00
4 Sara 2020-12-04 17:34:00
5 Sara 2020-12-04 20:11:00
6 Sara 2020-12-06 17:34:00
7 Silvia 2020-12-07 18:34:00
8 Silvia 2020-12-09 11:22:00
9 Paul 2020-12-09 11:22:00
10 Paul 2020-12-08 11:22:00
11 Paul 2020-12-07 11:22:00
I converted the time column to datetime:
df['Time'] = pd.to_datetime(df['Time'])
Now I want to get days in names and find the percentage of each day per name in columns:
df['Day'] = df['Time'].dt.day_name()
Result:
Name Time Day
0 Iza 2020-12-01 10:34:00 Tuesday
1 Iza 2020-12-02 10:34:00 Wednesday
2 Iza 2020-12-01 17:34:00 Tuesday
3 Iza 2020-12-01 17:34:00 Tuesday
4 Sara 2020-12-04 17:34:00 Friday
5 Sara 2020-12-04 20:11:00 Friday
6 Sara 2020-12-06 17:34:00 Sunday
7 Silvia 2020-12-07 18:34:00 Monday
8 Silvia 2020-12-09 11:22:00 Wednesday
9 Paul 2020-12-09 11:22:00 Wednesday
10 Paul 2020-12-08 11:22:00 Tuesday
11 Paul 2020-12-07 11:22:00 Monday
df2 = round(df.groupby(['Name'])['Day'].apply(lambda x: x.value_counts(normalize=True)) * 100)
Result:
Name
Iza Tuesday 75.0
Wednesday 25.0
Paul Monday 33.0
Tuesday 33.0
Wednesday 33.0
Sara Friday 67.0
Sunday 33.0
Silvia Wednesday 50.0
Monday 50.0
Name: Day, dtype: float64
I stuck here, my desired output - days in columns with % for each per name:
Name Sunday Monday Tuesday Wednesday Friday
Iza NaN NaN 75 25 NaN
Paul NaN 33 33 33 NaN
Sara 33 NaN NaN NaN 67
Silvia NaN 50 NaN 50 NaN
Upvotes: 1
Views: 52
Reputation: 862511
Use Categorical
for correct order in last Series.unstack
, solution was simplify without apply
:
df['Time'] = pd.to_datetime(df['Time'])
week = ['Sunday',
'Monday',
'Tuesday',
'Wednesday',
'Thursday',
'Friday',
'Saturday']
df['Day'] = pd.Categorical(df['Time'].dt.day_name(), ordered=True, categories=week)
df1 = df.groupby('Name')['Day'].value_counts(normalize=True).unstack().mul(100).round()
print (df1)
Day Sunday Monday Tuesday Wednesday Friday
Name
Iza NaN NaN 75.0 25.0 NaN
Paul NaN 33.0 33.0 33.0 NaN
Sara 33.0 NaN NaN NaN 67.0
Silvia NaN 50.0 NaN 50.0 NaN
For correct ordering is a bit changed solution:
df['Time'] = pd.to_datetime(df['Time'])
df['Day'] = df['Time'].dt.dayofweek
d = {0: 'Sunday', 1: 'Monday', 2: 'Tuesday', 3: 'Wednesday',
4: 'Thursday', 5: 'Friday', 6: 'Saturday'}
df1 = df.groupby('Name')['Day'].value_counts(normalize=True).unstack().mul(100).round().rename(columns=d)
print (df1)
Day Sunday Monday Tuesday Thursday Saturday
Name
Iza NaN 75.0 25.0 NaN NaN
Paul 33.0 33.0 33.0 NaN NaN
Sara NaN NaN NaN 67.0 33.0
Silvia 50.0 NaN 50.0 NaN NaN
Upvotes: 1
Reputation: 26676
Just unstack. You are on the money
df2 = round(df.groupby(['Name'])['Day'].apply(lambda x: x.value_counts(normalize=True)) * 100).unstack(level=1)
df2=df2[['Sunday','Monday','Tuesday', 'Wednesday','Friday']]
Sunday Monday Tuesday Wednesday Friday
Name
Iza NaN NaN 75.0 25.0 NaN
Paul NaN 33.0 33.0 33.0 NaN
Sara 33.0 NaN NaN NaN 67.0
Silvia NaN 50.0 NaN 50.0 NaN
Upvotes: 1