Reputation: 2059
there is a data frame that contains nested info. Let' say number of pupils, number of pupils in class A and number of pupils in class B in each school. So that pupils = n.pupilsA + n.pupilsB + other_pupils
a <- data.frame(
city = c(rep('New York',3), rep('Washington',3)),
n = c(5, 2, 1, 5, 2, 1),
name = c(
'pupils',
'classA',
'classB',
'pupils',
'classA',
'classB'
)
)
output:
city n name
1 New York 5 pupils
2 New York 2 classA
3 New York 1 classB
4 Washington 5 pupils
5 Washington 2 classA
6 Washington 1 classB
Is there a smart way (using dplyr, presumably) to make a group operation that will add into each group 'other' that will be difference between 'pupils' and 'pupils - class A' + 'pupils - class B'. So the result would be something like:
city type npupils
1 New York classA 2
2 New York classB 1
3 New York pupils 5
4 New York other 2
5 Washington classA 2
6 Washington classB 1
7 Washington pupils 5
8 Washington other 2
The only way I thought might work was to spread it, calculate difference between columns, and gather it back using tidyr
:
a %>%
spread(name, n) %>%
mutate(other = pupils - classA - classB) %>%
gather(type, npupils, c('classA', 'classB', 'pupils', 'other')) %>%
arrange(city)
which works, but I wonder if there is nicer way?
Upvotes: 1
Views: 113
Reputation: 388962
We can create a summarized dataframe and bind it to original one. For each city
we calculate n
by subtracting the value of n
where name == 'pupils'
by remaining values in the group and create a name
column as "Other" and add these rows to the original dataframe using bind_rows
.
library(dplyr)
bind_rows(a, a %>%
group_by(city)%>%
summarise(n = n[name == 'pupils'] - sum(n[name != 'pupils']),
name = "Other")) %>%
arrange(city)
# city n name
#1 New York 5 pupils
#2 New York 2 classA
#3 New York 1 classB
#4 New York 2 Other
#5 Washington 5 pupils
#6 Washington 2 classA
#7 Washington 1 classB
#8 Washington 2 Other
Note - Here I am assuming you have only one entry of "pupils" for each city
or else we can use which.max
to get the first entry.
Upvotes: 5