Reputation: 195
I have a dataframe of over 33,000 rows which I'd like to simplify:
Crime type
GeographyCode
E01006687 Burglary
E01007229 Anti-social behaviour
E01007229 Anti-social behaviour
E01007229 Anti-social behaviour
E01007229 Burglary
E01007229 Other theft
E01007229 Other theft
E01007229 Shoplifting
E01007229 Theft from the person
E01007230 Anti-social behaviour
E01007230 Anti-social behaviour
E01007230 Anti-social behaviour
E01007230 Anti-social behaviour
E01007230 Anti-social behaviour
E01007230 Anti-social behaviour
...
There are 207 unique values of 'GeographyCode' and 12 unique values of 'Crime type'.
I'd like to make a new dataframe which has 207 rows and 12 columns plus the 'GeographyCode' index column, with each column representing a crime type, and containing a count of all occurances of that crime type within the GeographyCode.
Something like this:
Burglary Anti-social Theft Shoplifting etc...
GeographyCode
E01006687 1 3 9 5 ...
E01007229 1 3 2 1 ...
E01007230 0 6 12 5 ...
...
I've tried a few things, but because there are no numeric values I'm finding it really difficult to get what I need.
Upvotes: 2
Views: 518
Reputation: 177018
You could use crosstab
to compute this:
>>> pd.crosstab(df.index, df['Crime type'])
Crime type Anti-social behaviour Burglary Other theft Shoplifting ...
E01006687 0 1 0 0
E01007229 3 1 2 1
E01007230 6 0 0 0
Upvotes: 6