Ifeanyi
Ifeanyi

Reputation: 87

How do I summarize levels from different columns in r

I have a table (df) with categorical variables as factors with different levels

A_ID B_ID C_ID
valid number valid number invalid number
valid number valid number invalid number
invalid number invalid number too shot
too shot too shot too shot
valid number too long too shot
too long too long valid number
invalid number valid number too long
too long invalid number too long
too short too short valid number
too short valid number too long
too long invalid number too long
valid number invalid number valid number

I want to summarize each column based on their number of levels, for example, I want to count the number of times each level occurred in each column, the result should look like the table below

Variable Count_valid Count_Invalid Count_Short Count_Long
A_ID 3 2 3 3
B_ID 4 4 2 2
C_ID 3 2 3 4

I have tried using apply fuction:

t(sapply(names(df), function(x) 
  c(count_Valid=count(df[x])== "valid value", 
    count_Invalid=count(df[x]) == "invalid value", 
    count_Short=count(df[x] == "too short", 
    count_Long=count(df[x] == "too long")))))

Upvotes: 0

Views: 189

Answers (1)

Karthik S
Karthik S

Reputation: 11584

Does this work:

library(dplyr)
library(tidyr)    
df %>% pivot_longer(cols = everything()) %>% count(name, value) %>% 
   pivot_wider(id_cols = name, names_from = value, values_from = n) %>% 
   select('Variable' = name, 'Count_valid' = `valid number`, 'Count_Invalid' = `invalid number`, 'Count_Short' = `too short`, 'Count_long' = `too long`)
# A tibble: 3 x 5
  Variable Count_valid Count_Invalid Count_Short Count_long
  <chr>          <int>         <int>       <int>      <int>
1 A_ID               4             2           3          3
2 B_ID               4             4           2          2
3 C_ID               3             2           3          4

Data used:

df
# A tibble: 12 x 3
   A_ID           B_ID           C_ID          
   <chr>          <chr>          <chr>         
 1 valid number   valid number   invalid number
 2 valid number   valid number   invalid number
 3 invalid number invalid number too short     
 4 too short      too short      too short     
 5 valid number   too long       too short     
 6 too long       too long       valid number  
 7 invalid number valid number   too long      
 8 too long       invalid number too long      
 9 too short      too short      valid number  
10 too short      valid number   too long      
11 too long       invalid number too long      
12 valid number   invalid number valid number  

Upvotes: 1

Related Questions