lecb
lecb

Reputation: 409

Table manipulation in R - dplyr/tidyverse solution

I'm trying to convert one df to another format.

Starting df:

df <- data.frame(id = c("1", "2", "3","4", "5", "6", "7", "8", "9", "10"),
    criteria_A = c("present", "present", "absent", "absent", "absent", "present", "absent", "present", "absent", "present"),
    criteria_B =c("absent", "absent", "present", "absent", "absent", "present", "absent", "absent", "present", "present"))

I want to count each criterion by present/absent and retabulate it as such:

df2 <- data.frame(criteria = c("criteria_A", "criteria_A", "criteria_B", "criteria_B"),
    count = c("5", "5", "4", "6"),
    status = c("present", "absent", "present", "absent"))

I thought about getting the count this way per criterion:

library(dplyr)
tmp1 <- df %>% group_by(criteria_A) %>% count() %>% mutate(criteria="criteria_A")
tmp1 <- tmp1 %>% rename(criteria_A=status)
tmp2 <- df %>% group_by(criteria_B) %>% count() %>% mutate(criteria="criteria_B")
tmp2 <- tmp2 %>% rename(criteria_B=status)

I figured I could then merge the outputs vertically. This is not an efficient or clever way to do it when in reality I have hundreds of criteria...

I'm sure there's an elegant solution I'm not clever enough to figure out!

Any help as always would be gratefully received.

Upvotes: 2

Views: 71

Answers (3)

Ronak Shah
Ronak Shah

Reputation: 389235

Using sapply and table :

sapply(df[-1], table)

#        criteria_A criteria_B
#absent           5          6
#present          5          4

Upvotes: 0

akrun
akrun

Reputation: 887851

We can use base R with table

table(names(df)[-1][col(df[-1])], unlist(df[-1]))
 
 #             absent present
 # criteria_A      5       5
 # criteria_B      6       4

If we wrap with data.frame, it gives the long format

as.data.frame(table(names(df)[-1][col(df[-1])], unlist(df[-1])))
#        Var1    Var2 Freq
#1 criteria_A  absent    5
#2 criteria_B  absent    6
#3 criteria_A present    5
#4 criteria_B present    4

Upvotes: 1

nniloc
nniloc

Reputation: 4243

You can try using dplyr::tally after pivoting the data to long format using pivot_longer.

library(dplyr)

df %>%
  pivot_longer(-id, 
               names_to = 'criteria',
               values_to = 'status') %>%
  group_by(criteria, status) %>%
  tally

#----
  criteria   status      n
  <chr>      <chr>   <int>
1 criteria_A absent      5
2 criteria_A present     5
3 criteria_B absent      6
4 criteria_B present     4

Upvotes: 4

Related Questions