metassi
metassi

Reputation: 55

R dataframe: aggregate by group and convert to frequency count

I'm new to R and was wondering whether anyone could help me with a problem:

I have a dataframe like this:

panelist_id type refer_sm refer_se
1 HP 1 0
1 HP 0 1
1 PB 0 1
2 PB 0 1
3 TN 1 0
3 HP 1 0

I want to produce a dataframe that aggregates the number of visits to type per panelist, and the number of 'refers' to a type, at the panelist_id level, such as:

panelist_id No. visits to type: HP No. visits to type PB No. visits to type TN No. refers from sm to HP No. refers from sm to PB No. refers from sm to TN No. refers from se to HP No. refers from se to PB No. refers from se to TN
1 2 1 0 1 0 0 1 1 0
2 0 1 0 0 0 0 0 1 0
3 1 0 1 1 0 1 0 0 0

I've tried using aggregate and summarise functions without much success. Any solutions would be much appreciated. Thanks

Upvotes: 0

Views: 51

Answers (2)

ThomasIsCoding
ThomasIsCoding

Reputation: 101024

A base R option using reshape + aggregate

reshape(
  aggregate(. ~ panelist_id + type, cbind(df, n = 1), sum),
  idvar = "panelist_id",
  timevar = "type",
  direction = "wide"
)

gives

  panelist_id refer_sm.HP refer_se.HP n.HP refer_sm.PB refer_se.PB n.PB
1           1           1           1    2           0           1    1
2           3           1           0    1          NA          NA   NA
4           2          NA          NA   NA           0           1    1
  refer_sm.TN refer_se.TN n.TN
1          NA          NA   NA
2           1           0    1
4          NA          NA   NA

Or, we can use xtabs

cbind(
  xtabs(~ panelist_id + type, df),
  xtabs(refer_sm ~ ., aggregate(refer_sm ~ panelist_id + type, df, sum)),
  xtabs(refer_se ~ ., aggregate(refer_se ~ panelist_id + type, df, sum))
)

gives

  HP PB TN HP PB TN HP PB TN
1  2  1  0  1  0  0  1  1  0
2  0  1  0  0  0  0  0  1  0
3  1  0  1  1  0  1  0  0  0

Upvotes: 0

akrun
akrun

Reputation: 886938

Here is an option with tidyverse. Create a column of 1s (n), then use pivot_wider specifying the names_from as 'type' and values_from with 'n' and the 'refer' columns, specifying the values_fn as sum and the values_fill as 0 (to return 0 where there are no combinations - by default it returns NA)

library(dplyr)
library(tidyr)
df1 %>%
     mutate(n = 1) %>% 
     pivot_wider(names_from = type, values_from = c(n, refer_sm, refer_se),
           values_fn = sum, values_fill = 0)

-output

# A tibble: 3 x 10
#  panelist_id  n_HP  n_PB  n_TN refer_sm_HP refer_sm_PB refer_sm_TN refer_se_HP refer_se_PB refer_se_TN
#        <int> <dbl> <dbl> <dbl>       <int>       <int>       <int>       <int>       <int>       <int>
#1           1     2     1     0           1           0           0           1           1           0
#2           2     0     1     0           0           0           0           0           1           0
#3           3     1     0     1           1           0           1           0           0           0

data

df1 <- structure(list(panelist_id = c(1L, 1L, 1L, 2L, 3L, 3L), type = c("HP", 
"HP", "PB", "PB", "TN", "HP"), refer_sm = c(1L, 0L, 0L, 0L, 1L, 
1L), refer_se = c(0L, 1L, 1L, 1L, 0L, 0L)), 
class = "data.frame", row.names = c(NA, 
-6L))

Upvotes: 1

Related Questions