Petra
Petra

Reputation: 43

R grouped frequency table

I'm an R noob and I feel this should be simple but I cannot work it out. I have a survey dataset, with columns for ID, employer, practice_area and then a number of columns where the survey takers had to indicate which tools they use with a 'check all that apply' instruction. The data set now has a column for each tool option with either 1 or 0.

Sample df:

np1 <- data.frame(ID = c(1:10),
                 practice_area = c("A", "B", "C", "A", "A", "C", "B", "D", "C", "A"),
                 tool_1 = sample(0:1,10, replace = TRUE),
                 tool_2 = sample(0:1,10, replace = TRUE),
                 tool_3 = sample(0:1,10, replace = TRUE),
                 tool_4 = sample(0:1,10, replace = TRUE),
                 tool_5 = sample(0:1,10, replace = TRUE))

I'd like a frequency table that is grouped by practice_area. So basically I can see the results that it would say practice_area A, x people use tool_1, x people use tool_2, etc.

Upvotes: 0

Views: 191

Answers (2)

mrbonewithgale
mrbonewithgale

Reputation: 100

# data
df <- data.frame(ID = c(1, 2, 3, 4 ,5),
                 employer = c("A", "B", "C", "D", "E"),
                 practice_area = c("X", "Y", "X", "X", "X"),
                 tool_1 = c(1, 0, 0, 1, 1),
                 tool_2 = c(1, 0, 0, 1, 0),
                 tool_3 = c(1, 1, 1, 1, 1),
                 tool_4 = c(0, 1, 1, 0, 1))

Output:

enter image description here

# code
df %>%
  group_by(practice_area) %>%
  summarise(tool_1 = sum(tool_1), tool_2 = sum(tool_2),
            tool_3 = sum(tool_3), tool_4 = sum(tool_4))

Upvotes: 1

van Nijnatten
van Nijnatten

Reputation: 404

Ok, so let's start off with creating a dataset, to reproduce this problem.

library(tidyverse)

df <- data.frame(
    ID = 1:50,
    employer = rep(
        c("employer.1","employer.2"),
        25
    ),
    practice_area = rep(
        1:5,
        10
    ),
    tool.1 = sample(0:1, 50, replace=T),
    tool.2 = sample(0:1, 50, replace=T)
)

So, If I want a table like this:

# A tibble: 10 x 3
# Groups:   practice_area [5]
   practice_area tool       n
           <int> <chr>  <int>
 1             1 tool.1     7
 2             1 tool.2     2
 3             2 tool.1     4
 4             2 tool.2     2
 5             3 tool.1     2
 6             3 tool.2     4
 7             4 tool.1     4
 8             4 tool.2     6
 9             5 tool.1     6
10             5 tool.2     5

I would do

df %>%
    pivot_longer(
        starts_with("tool"),
        names_to = "tool",
        values_to = "uses_tool"
    ) %>%
    filter(uses_tool != 0) %>%
    group_by(practice_area) %>%
    count(tool)

In this piece of code, I make a long table (instead of wide) in which I have a column for the tools (selected with start_with, see https://dplyr.tidyverse.org/reference/select.html). After that, I remove the ones that don't use the tool (uses_tool != 0) and I group them by the practice area. The only thing to do then is to count the occurrences by group.

Upvotes: 1

Related Questions