Reputation: 3138
I have the following DataFrame that represents whether a User was present in some week, some year:
User Year Week
0 John 2020 1
1 John 2020 2
2 Steve 2020 1
3 Fred 2020 3
4 George 2020 2
5 George 2020 3
... ... ...
200 John 2021 2
201 John 2021 4
202 Steve 2021 2
203 Fred 2021 2
204 George 2021 1
205 George 2021 4
I want to get a DataFrame that groups the dataset by User
and each column represents whether he was present in a certain week of a certain year, each column either being of type boolean or integer with possible values 0 or 1.
It would look something like this:
2020_1 2020_2 2020_3 ... 2021_1 2021_2 2021_3 2021_4
John 1 1 0 ... 0 1 0 1
Steve 1 0 0 ... 0 1 0 0
Fred 0 0 1 ... 0 1 0 0
George 0 1 1 ... 1 0 0 1
Is there anyway to do this without iterating through the DataFrme?
Thanks.
Upvotes: 2
Views: 68
Reputation: 149
Here's one way you can do this:
import pandas as pd
df = pd.DataFrame({
"User" : ["John","John","Steve","Fred","George","George"],
"Year" : [2020,2020,2020,2020,2020,2020],
"Week": [1,2,1,3,2,3]})
# add a helper column for year_week
df["year_week"] = df["Year"].map(str) + "_" + df["Week"].map(str)
# group by User and year_week, then unstack and fill NaN with 0
df.groupby(["User","year_week"]).size().unstack(fill_value = 0)
Results in:
| User | 2020_1 | 2020_2 | 2020_3 |
|:-------|---------:|---------:|---------:|
| Fred | 0 | 0 | 1 |
| George | 0 | 1 | 1 |
| John | 1 | 1 | 0 |
| Steve | 1 | 0 | 0 |
Upvotes: 3
Reputation: 150735
Create a new column and use pd.crosstab
:
pd.crosstab(df['User'],
df[['Year','Week']].astype(str).apply('_'.join, axis=1)
)
Output:
col_0 2020_1 2020_2 2020_3 2021_1 2021_2 2021_4
User
Fred 0 0 1 0 1 0
George 0 1 1 1 0 1
John 1 1 0 0 1 1
Steve 1 0 0 0 1 0
Upvotes: 3
Reputation: 26676
pd.crosstab(df.User, df['Year'].astype(str)+"_"+df['Week'].astype(str))
Upvotes: 2