Reputation: 131
I'd like to know how to transform a table and to get my desired outcome:
My Sample Dataset:
df=pd.DataFrame({
"ID":[111,111,111,111,222,222,222,333,333],
"Section":["CS01","CS01","IT01","IT01","CS02","CS02","CS02","HS01","HS01"],
"Subject":["Hist","Pol","Pol","Arts","Pol","Hist","Arts","Pol","Hist"],
"Activity":["Quiz 1","Quiz 2","Quiz 3","Quiz 1","Quiz 2","Quiz 3","Quiz 1","Quiz 2","Quiz 3"],
"Pass":[1,0,0,1,1,1,0,1,0],
})
What it looks like:
ID Section Subject Activity Pass
0 111 CS01 Hist Quiz 1 1
1 111 CS01 Pol Quiz 2 0
2 111 IT01 Pol Quiz 3 0
3 111 IT01 Arts Quiz 1 1
4 222 CS02 Pol Quiz 2 1
5 222 CS02 Hist Quiz 3 1
6 222 CS02 Arts Quiz 1 0
7 333 HS01 Pol Quiz 2 1
8 333 HS01 Hist Quiz 3 0
What I'm trying to do:
ID Section Subject Quiz 1 Quiz 2 Quiz 3
0 1 NA 0 1 NA 0 1 NA
111 CS01 Hist 0 1 0 0 0 1 0 0 1
111 CS01 Pol 0 0 1 1 0 0 0 0 1
111 IT01 Arts 0 1 0 0 0 1 0 0 1
111 IT01 Pol 0 0 1 0 0 1 1 0 0
222 CS02 Arts 1 0 0 0 0 0 0 0 0
222 CS02 Hist 0 0 1 0 0 1 0 1 0
222 CS02 Pol 0 0 1 0 1 0 0 0 1
333 HS01 Hist 0 0 1 0 0 1 1 0 0
333 HS01 Pol 0 0 1 0 1 0 0 0 1
What I want is to make the "Subject" column as level 2 and "Pass" column for its level 1 with "NA" column.
So far what I only have is this:
df.groupby(["ID","Section", "Subject","Activity"])["Pass"].value_counts().unstack().fillna(0)
But this doesn't have the "NA" column nor the "Activity" as Level 2
Upvotes: 1
Views: 184
Reputation: 863176
Idea is create all possible combination in first step by Series.reindex
with MultiIndex.from_product
and then apply your solution with MultiIndex
and dropna=False
in value_counts:
s = df.set_index(["ID","Section", "Subject","Activity"])["Pass"]
df = (s.reindex(pd.MultiIndex.from_product(s.index.levels))
.groupby(level=[0,1,2,3])
.value_counts(dropna=False)
.unstack([3,4], fill_value=0)
.sort_index(axis=1))
print (df)
Activity Quiz 1 Quiz 2 Quiz 3
Pass 0.0 1.0 NaN 0.0 1.0 NaN 0.0 1.0 NaN
ID Section Subject
111 CS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 1 0 0 0 1 0 0 1
Pol 0 0 1 1 0 0 0 0 1
CS02 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
HS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
IT01 Arts 0 1 0 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 1 0 0
222 CS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
CS02 Arts 1 0 0 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 1 0
Pol 0 0 1 0 1 0 0 0 1
HS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
IT01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
333 CS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
CS02 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
HS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 1 0 0
Pol 0 0 1 0 1 0 0 0 1
IT01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
EDIT: Solution working with duplicates:
df=pd.DataFrame({
"ID":[111,111,111,111,222,222,222,333,333],
"Section":["CS01","CS01","IT01","IT01","CS02","CS02","CS02","HS01","HS01"],
"Subject":["Hist","Pol","Pol","Arts","Pol","Hist","Arts","Pol","Hist"],
"Activity":["Quiz 1","Quiz 2","Quiz 3","Quiz 1","Quiz 2","Quiz 3","Quiz 1","Quiz 2","Quiz 3"],
"Pass":[1,0,0,1,1,1,0,1,0],
})
df = pd.concat([df, df.head()])
print (df)
ID Section Subject Activity Pass
0 111 CS01 Hist Quiz 1 1
1 111 CS01 Pol Quiz 2 0
2 111 IT01 Pol Quiz 3 0
3 111 IT01 Arts Quiz 1 1
4 222 CS02 Pol Quiz 2 1
5 222 CS02 Hist Quiz 3 1
6 222 CS02 Arts Quiz 1 0
7 333 HS01 Pol Quiz 2 1
8 333 HS01 Hist Quiz 3 0
0 111 CS01 Hist Quiz 1 1 <- duplicates
1 111 CS01 Pol Quiz 2 0 <- duplicates
2 111 IT01 Pol Quiz 3 0 <- duplicates
3 111 IT01 Arts Quiz 1 1 <- duplicates
4 222 CS02 Pol Quiz 2 1 <- duplicates
First use SeriesGroupBy.value_counts
and reshape ony last level by Series.unstack
, add all possible combinations of levels
by DataFrame.reindex
and add column NaN
filled by 1
if all values are 0
in both columns tested by DataFrame.eq
and DataFrame.all
, last unstack
for MultiIndex
in columns, change order of levels and sort MultiIndex
:
df1 = (df.groupby(["ID","Section", "Subject","Activity"])["Pass"]
.value_counts()
.unstack(fill_value=0))
df1 = df1.reindex(pd.MultiIndex.from_product(df1.index.levels), fill_value=0)
df1[np.nan] = df1.eq(0).all(axis=1).view('i1')
df1 = df1.unstack().swaplevel(1,0, axis=1).sort_index(axis=1)
print (df1)
Activity Quiz 1 Quiz 2 Quiz 3
Pass 0.0 1.0 NaN 0.0 1.0 NaN 0.0 1.0 NaN
ID Section Subject
111 CS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 2 0 0 0 1 0 0 1
Pol 0 0 1 2 0 0 0 0 1
CS02 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
HS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
IT01 Arts 0 2 0 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 2 0 0
222 CS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
CS02 Arts 1 0 0 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 1 0
Pol 0 0 1 0 2 0 0 0 1
HS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
IT01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
333 CS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
CS02 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
HS01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 1 0 0
Pol 0 0 1 0 1 0 0 0 1
IT01 Arts 0 0 1 0 0 1 0 0 1
Hist 0 0 1 0 0 1 0 0 1
Pol 0 0 1 0 0 1 0 0 1
Upvotes: 1