Reputation: 145
2.1 after apply option 1 from @mozway, got a new error saying ValueError: Index contains duplicate entries, cannot reshape.
2.1.1 update the data frame to reflect the error.(which is changing the score to good for both B and C under the same zip code that produce two ranks 1)
2.1.2 using "first" instead of "dense" solved the error
---updated above
why my index "zip" couldn't be found after Groupby?
I checked the type of my dataframe op (pandas.core.frame.DataFrame) which is as same as the example pandas documentation provided.Pandas example
df = pd.DataFrame({'zip': ["46062","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236"],
'goodbadscore': ["good","good","good","good","good","bad","bad","good","good","good","bad","good","good","good","good","good","bad","good"],
'postlcode' : ["46062","11236","11236","11236","11236","46062","11236","46062","11236","11236","11236","11236","11236","11236","11236","11236","11236","11236"],
'companyname': ["A","B","C","C","C","A","A","A","B","B","A","A","B","A","A","B","A","A"]}
)
print(df)
print(type(df))
### count vendor under zip and count good
op = df.groupby(['zip','companyname'], as_index=True).aggregate(
countinzipcode=('companyname', 'count'),
good=('goodbadscore', lambda s: s.eq('good').sum())
)
### calculate % on time = on time count / total count
op['ontime%'] = op['good'] / op['countinzipcode'] * 100
### total zip rank and rank within zip
op['rankingbyallzip'] = op['ontime%'].rank(ascending=False, method='dense').astype(int)
op['rankwithinzip'] = op.groupby("zip")["ontime%"].rank("dense", ascending=False)
print(op)
print(type(op))
### keyerror
op.pivot(index = "zip", columns="rankwithinzip", values="company")
Upvotes: 0
Views: 81
Reputation: 260530
If you do not want to affect your op
dataframe, you can do:
op.reset_index().pivot(index = "zip", columns="rankwithinzip", values="companyname")
output:
rankwithinzip 1.0 2.0 3.0
zip
11236 C B A
46062 A NaN NaN
If you don't care about op
, then use @Henry Ecker solution
I suggest to add .astype(int)
to make your rankwithinzip
column int
type:
op['rankwithinzip'] = op.groupby("zip")["ontime%"].rank("dense", ascending=False).astype(int)
Thus your final table will be:
zip 1 2 3
11236 C B A
46062 A
Upvotes: 2