Reputation: 55
I'm new to R and was wondering whether anyone could help me with a problem:
I have a dataframe like this:
panelist_id | type | refer_sm | refer_se |
---|---|---|---|
1 | HP | 1 | 0 |
1 | HP | 0 | 1 |
1 | PB | 0 | 1 |
2 | PB | 0 | 1 |
3 | TN | 1 | 0 |
3 | HP | 1 | 0 |
I want to produce a dataframe that aggregates the number of visits to type per panelist, and the number of 'refers' to a type, at the panelist_id level, such as:
panelist_id | No. visits to type: HP | No. visits to type PB | No. visits to type TN | No. refers from sm to HP | No. refers from sm to PB | No. refers from sm to TN | No. refers from se to HP | No. refers from se to PB | No. refers from se to TN |
---|---|---|---|---|---|---|---|---|---|
1 | 2 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 |
2 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
3 | 1 | 0 | 1 | 1 | 0 | 1 | 0 | 0 | 0 |
I've tried using aggregate and summarise functions without much success. Any solutions would be much appreciated. Thanks
Upvotes: 0
Views: 51
Reputation: 101024
A base R option using reshape
+ aggregate
reshape(
aggregate(. ~ panelist_id + type, cbind(df, n = 1), sum),
idvar = "panelist_id",
timevar = "type",
direction = "wide"
)
gives
panelist_id refer_sm.HP refer_se.HP n.HP refer_sm.PB refer_se.PB n.PB
1 1 1 1 2 0 1 1
2 3 1 0 1 NA NA NA
4 2 NA NA NA 0 1 1
refer_sm.TN refer_se.TN n.TN
1 NA NA NA
2 1 0 1
4 NA NA NA
Or, we can use xtabs
cbind(
xtabs(~ panelist_id + type, df),
xtabs(refer_sm ~ ., aggregate(refer_sm ~ panelist_id + type, df, sum)),
xtabs(refer_se ~ ., aggregate(refer_se ~ panelist_id + type, df, sum))
)
gives
HP PB TN HP PB TN HP PB TN
1 2 1 0 1 0 0 1 1 0
2 0 1 0 0 0 0 0 1 0
3 1 0 1 1 0 1 0 0 0
Upvotes: 0
Reputation: 886938
Here is an option with tidyverse
. Create a column of 1s (n
), then use pivot_wider
specifying the names_from
as 'type' and values_from
with 'n' and the 'refer' columns, specifying the values_fn
as sum
and the values_fill
as 0 (to return 0 where there are no combinations - by default it returns NA
)
library(dplyr)
library(tidyr)
df1 %>%
mutate(n = 1) %>%
pivot_wider(names_from = type, values_from = c(n, refer_sm, refer_se),
values_fn = sum, values_fill = 0)
-output
# A tibble: 3 x 10
# panelist_id n_HP n_PB n_TN refer_sm_HP refer_sm_PB refer_sm_TN refer_se_HP refer_se_PB refer_se_TN
# <int> <dbl> <dbl> <dbl> <int> <int> <int> <int> <int> <int>
#1 1 2 1 0 1 0 0 1 1 0
#2 2 0 1 0 0 0 0 0 1 0
#3 3 1 0 1 1 0 1 0 0 0
df1 <- structure(list(panelist_id = c(1L, 1L, 1L, 2L, 3L, 3L), type = c("HP",
"HP", "PB", "PB", "TN", "HP"), refer_sm = c(1L, 0L, 0L, 0L, 1L,
1L), refer_se = c(0L, 1L, 1L, 1L, 0L, 0L)),
class = "data.frame", row.names = c(NA,
-6L))
Upvotes: 1