Reputation: 113
I've already asked similar quistion, but I stated it incorrectly, so answers doesn't help me. Again, I have two data sets, The first one is like this:
df1 <- data.frame(id=c(111,111,111,222,222,333,333,333,333),
type=c("a","b","a","d","b","c","c","b","b"),
var=c(1,0,1,0,1,1,1,1,1))
df1
id type var
1 111 a 1
2 111 b 0
3 111 a 1
4 222 d 0
5 222 b 1
6 333 c 1
7 333 c 1
8 333 b 1
9 333 b 1
The second is like this:
df2
id A B
1 111
2 222
3 333
I need to fill the empty cells in such a way that A is the sum of var
with type a
or b
, B is the sum of var
with type c
or d
for each id. The result should be like this:
df2
id A B
1 111 2 0
2 222 1 0
3 333 2 2
It's important to fill this very data frame (df2), not create the new one
Upvotes: 1
Views: 140
Reputation: 887098
We can do this in a single line in base R
(without using any external packages)
transform(as.data.frame.matrix(xtabs(var~id+type, df1)), A= a+b, B = c+d)[-(1:4)]
# A B
#111 2 0
#222 1 0
#333 2 2
Upvotes: 1
Reputation: 43334
It's really just aggregation plus reshaping to wide form:
library(tidyverse)
# set grouping, edit var to A/B form
df1 %>% group_by(id, type = ifelse(type %in% c('a', 'b'), 'A', 'B')) %>%
summarise(var = sum(var)) %>%
spread(type, var, fill = 0) # reshape to wide
## Source: local data frame [3 x 3]
## Groups: id [3]
##
## id A B
## * <dbl> <dbl> <dbl>
## 1 111 2 0
## 2 222 1 0
## 3 333 2 2
You could create A
and B
in summarise
if you subset var
, but the code is more repetitive.
In base R,
df2 <- df1
df2$type <- ifelse(df2$type %in% c('a', 'b'), 'A', 'B')
df2 <- aggregate(var ~ id + type, df2, sum)
df2 <- reshape(df2, timevar = 'type', direction = 'wide')
df2[is.na(df2)] <- 0L
names(df2) <- sub('var\\.', '', names(df2))
df2
## id A B
## 1 111 2 0
## 2 222 1 0
## 3 333 2 2
Upvotes: 1