Reputation: 29
Hi I have a number of similar datasets with different category types. I have decided which categories from certain datasets will be grouped together to put into 1 unified category. I am looking for a way to group these different rows into one unified row.
For example, in dataset 1, it has a crime type called 'Criminal damage'.
In my dataset 2, I would like the rows that has have 'Malicious damage to property' and 'Arson' in the column titled Category to be grouped together in a new row with information the same(Province, Station) but the Category column value is 'Criminal Damage' and the values under the years X2008 X2009 etc are added together, in order to be aligned with my other dataset. The same with: Burglary at resident premises and burglary at non-residential premises should be grouped as 'Burglary'. Also with: Driving under the influence of alcohol or drugs and illegal possession of firearms and ammunition should be grouped as 'Other notable offences'.
Here is an example of the data needed to be transformed:
Province Station Category X2008 X2009 X2010 X2011 X2012 X2013
Kwazulu/Natal Phoenix Malicious damage to property 778 883 767 754 804 785
Kwazulu/Natal Phoenix Arson 16 24 13 26 12 9
North West Phokeng Malicious damage to property 243 235 154 187 153 156
North West Phokeng Arson 12 13 12 9 6 6
Eastern Cape Adelaide Burglary at residential premises 103 93 89 73 75 53
Eastern Cape Adelaide Burglary at non-residential premises 20 13 22 20 19 26
North West Mogwase Driving under the influence of alcohol or drugs 37 36 31 38 24 18
North West Mogwase Illegal possession of firearms and ammunition 18 8 12 12 12 13
To be turned into something like this:
Province Station Category X2008 X2009 X2010 X2011 X2012 X2013
Kwazulu/Natal Phoenix Criminal damage 794 897 780 780 816 794
North West Phokeng Criminal damage 255 248 166 196 159 162
Eastern Cape Adelaide Burglary 123 106 111 93 94 79
North West Mogwase Other notable offences 55 44 43 50 36 31
Any help would be so appreciated! I am so stuck on how to go about doing this. Thanks!
Upvotes: 2
Views: 55
Reputation: 311
One way (although probably not the most efficient) you could do this is by renaming each category to match your new combined categories (see this answer), then group using group_by()
as akrun said.
df1$Category[df1$Category == "Burglary at residential premises"] <- "Burglary"
df1$Category[df1$Category == "Burglary at non-residential premises"] <- "Burglary"
(and so on)
Then use dplyr
's summarise()
to find the sum of all crimes for each group of province, station and category.
Upvotes: 0
Reputation: 886948
We can use group_by
with Category
as "Criminal damage"`
library(dplyr)
df1 %>%
group_by(Province, Station, Category = "Criminal damage") %>%
summarise_all(sum)
#A tibble: 2 x 9
# Groups: Province, Station [?]
# Province Station Category X2008 X2009 X2010 X2011 X2012 X2013
# <chr> <chr> <chr> <int> <int> <int> <int> <int> <int>
#1 Kwazulu/Natal Phoenix Criminal damage 794 907 780 780 816 794
#2 North West Phokeng Criminal damage 255 248 166 196 159 162
Upvotes: 2