CMai
CMai

Reputation: 53

Data transformation: spread categorical data frame to counts R

I'm trying to transform a data frame with several categorical variables to frequency counts.

Data looks like this:

Site  Date          ID  X1      X2      X3
A     June - 01/16  1   aware   resting resting
B     June - 03/16  2   aware   feeding feeding
C     June - 01/16  1   resting aware   aware

And I want to transform it into something like this:

site   date           ID  aware   resting   feeding
A      June - 01/16   1   3       2         1 
B      June - 01/16   2   1       0         2 

I've tried to use dplyr but didn't manage to select all the variables I want (X1, X2 and X3)

data_frame %>% 
  dplyr::count((data_frame[c(1:3)]),cbind(data_frame[c(4:6)])) %>% 
  tidyr::spread(key = (data_frame[c(4:6)]),value = n)

This is an example of my data:

data_frame <- structure(data.frame(site = c("A", "B", "C", "A", "B", "C", "D"), 
date = c("June - 01/16","June - 03/16", "June - 01/16", "June - 01/16", "June - 03/16", "June - 03/16", "June - 03/16"), 
ID = c("1", "2", "1", "3", "1", "2", "3"), 
X1= c("aware", "aware","resting","feeding","aware", "resting","feeding"),
X2 = c("resting","feeding","aware","na","na","aware","resting"),
X3 = c("resting","feeding","aware", "aware","resting","feeding","aware")))

Upvotes: 2

Views: 191

Answers (2)

akrun
akrun

Reputation: 887058

We could reshape into 'long' format, get the count based on the column and reshape back to 'wide' with pivot_wider

library(dplyr)
library(tidyr)
data_frame %>% 
   pivot_longer(cols = X1:X3) %>% 
   select(-name) %>%
   count(site, date, ID, value) %>% 
   pivot_wider(names_from = value, values_from = n, values_fill = 0)

Or we can use values_fn

data_frame %>% 
   pivot_longer(cols = X1:X3) %>% 
   select(-name) %>%       
   pivot_wider(names_from = value, values_from = n, values_fill = 0,
        values_fn = length)

Upvotes: 2

TarJae
TarJae

Reputation: 78927

Update: With akrun's help here is the code with add_count

data_frame %>%    
  pivot_longer(cols = c(X1, X2, X3)) %>%    
  add_count(value) %>% select(-name) %>% 
  distinct %>%  
  pivot_wider(names_from = "value", values_from = n, values_fill = 0)  

Output:

  site  date         ID    aware resting feeding    na
  <chr> <chr>        <chr> <int>   <int>   <int> <int>
1 A     June - 01/16 1         8       6       0     0
2 B     June - 03/16 2         8       0       5     0
3 C     June - 01/16 1         8       6       0     0
4 A     June - 01/16 3         8       0       5     2
5 B     June - 03/16 1         8       6       0     2
6 C     June - 03/16 2         8       6       5     0
7 D     June - 03/16 3         8       6       5     0

Upvotes: 2

Related Questions