Reputation: 43
I'm an R noob and I feel this should be simple but I cannot work it out. I have a survey dataset, with columns for ID, employer, practice_area and then a number of columns where the survey takers had to indicate which tools they use with a 'check all that apply' instruction. The data set now has a column for each tool option with either 1 or 0.
Sample df:
np1 <- data.frame(ID = c(1:10),
practice_area = c("A", "B", "C", "A", "A", "C", "B", "D", "C", "A"),
tool_1 = sample(0:1,10, replace = TRUE),
tool_2 = sample(0:1,10, replace = TRUE),
tool_3 = sample(0:1,10, replace = TRUE),
tool_4 = sample(0:1,10, replace = TRUE),
tool_5 = sample(0:1,10, replace = TRUE))
I'd like a frequency table that is grouped by practice_area. So basically I can see the results that it would say practice_area A, x people use tool_1, x people use tool_2, etc.
Upvotes: 0
Views: 191
Reputation: 100
# data
df <- data.frame(ID = c(1, 2, 3, 4 ,5),
employer = c("A", "B", "C", "D", "E"),
practice_area = c("X", "Y", "X", "X", "X"),
tool_1 = c(1, 0, 0, 1, 1),
tool_2 = c(1, 0, 0, 1, 0),
tool_3 = c(1, 1, 1, 1, 1),
tool_4 = c(0, 1, 1, 0, 1))
Output:
# code
df %>%
group_by(practice_area) %>%
summarise(tool_1 = sum(tool_1), tool_2 = sum(tool_2),
tool_3 = sum(tool_3), tool_4 = sum(tool_4))
Upvotes: 1
Reputation: 404
Ok, so let's start off with creating a dataset, to reproduce this problem.
library(tidyverse)
df <- data.frame(
ID = 1:50,
employer = rep(
c("employer.1","employer.2"),
25
),
practice_area = rep(
1:5,
10
),
tool.1 = sample(0:1, 50, replace=T),
tool.2 = sample(0:1, 50, replace=T)
)
So, If I want a table like this:
# A tibble: 10 x 3
# Groups: practice_area [5]
practice_area tool n
<int> <chr> <int>
1 1 tool.1 7
2 1 tool.2 2
3 2 tool.1 4
4 2 tool.2 2
5 3 tool.1 2
6 3 tool.2 4
7 4 tool.1 4
8 4 tool.2 6
9 5 tool.1 6
10 5 tool.2 5
I would do
df %>%
pivot_longer(
starts_with("tool"),
names_to = "tool",
values_to = "uses_tool"
) %>%
filter(uses_tool != 0) %>%
group_by(practice_area) %>%
count(tool)
In this piece of code, I make a long table (instead of wide) in which I have a column for the tools (selected with start_with
, see https://dplyr.tidyverse.org/reference/select.html). After that, I remove the ones that don't use the tool (uses_tool != 0) and I group them by the practice area. The only thing to do then is to count the occurrences by group.
Upvotes: 1