Reputation: 39
I have a dataframe
that looks like this:
Supervisor Score
Bill Pass
Bill Pass
Susan Fail
Susan Fail
Susan Fail
I would like to do some aggregates (such as getting the % of pass by supervisor) and would like to split up the Score column so all the Pass are in one column and all the Fail are in another column. Like this:
Supervisor Pass Fail
Bill 0 1
Bill 0 1
Susan 1 0
Susan 1 0
Susan 1 0
Any ideas? Would a simple groupby
work by grouping both the supervisor and score columns and getting a count of Score?
Upvotes: 1
Views: 107
Reputation: 1445
**Let's try this one**
df=pd.DataFrame({'Supervisor':['Bill','Bill','Susan','Susan','Susan'],
'Score':['Pass','Pass','Fail','Fail','Fail']}).set_index('Supervisor')
pd.get_dummies(df['Score'])
PANDAS 100 tricks
For More Pandas trick refer following : https://www.kaggle.com/python10pm/pandas-100-tricks
Upvotes: 1
Reputation: 59519
pd.get_dummies
Removes any columns you specify from your DataFrame in favor of N dummy columns with the default naming convention 'OrigName_UniqueVal'
. Specifying empty strings for the prefix and separator gives you column headers of only the unique values.
pd.get_dummies(df, columns=['Score'], prefix_sep='', prefix='')
Supervisor Fail Pass
0 Bill 0 1
1 Bill 0 1
2 Susan 1 0
3 Susan 1 0
4 Susan 1 0
If in the end you just want % of each category by supervisor then you don't really need the dummies. You can groupby
. I use a reindex
to ensure the resulting DataFrame has each category represented for each Supervisor.
(df.groupby(['Supervisor']).Score.value_counts(normalize=True)
.reindex(pd.MultiIndex.from_product([df.Supervisor.unique(), df.Score.unique()]))
.fillna(0))
#Bill Pass 1.0
# Fail 0.0
#Susan Pass 0.0
# Fail 1.0
#Name: Score, dtype: float64
Upvotes: 5
Reputation: 30920
IIUC, you want DataFrame.pivot_table
+ DataFrmae.join
new_df = df[['Supervisor']].join(df.pivot_table(columns = 'Score',
index = df.index,
values ='Supervisor',
aggfunc='count',
fill_value=0))
print(new_df)
Supervisor Fail Pass
0 Bill 0 1
1 Bill 0 1
2 Susan 1 0
3 Susan 1 0
4 Susan 1 0
For the output expect:
new_df = df[['Supervisor']].join(df.pivot_table(columns = 'Score',
index = df.index,
values ='Supervisor',
aggfunc='count',
fill_value=0)
.eq(0)
.astype(int))
print(new_df)
Supervisor Fail Pass
0 Bill 1 0
1 Bill 1 0
2 Susan 0 1
3 Susan 0 1
4 Susan 0 1
Upvotes: 3
Reputation: 2526
To get the df you want you can do it like this:
df["Pass"] = df["Score"].apply(lambda x: 0 if x == "Pass" else 1)
df["Fail"] = df["Score"].apply(lambda x: 0 if x == "Fail" else 1)
Upvotes: 0