I just started using R. And I have a stata dataset which I opened in R. In the questionnaire there is a question “Please look carefully at the following list of political groups and say which, if any, do you belong to?” . Variable v1 to v10 represents the different groups and each have values of 1 or 0 which is ‘yes’ or ‘no’. My question is: How do I find the percentage of people who are members of atleast 2 groups?
I think I’m supposed to use dplyr but I am not sure.
One of the idea that I've got was to use filter and mutate.
Upvotes: 0
Views: 230
Reputation: 1
Start creating some fake data
df <- tibble(id = 1:5,
v1 = c(1, 1, 0, 0, 0),
v2 = c(1, 1, 0, 0, 0),
v3 = rep(0, 5),
v4 = rep(0, 5),
v5 = rep(0, 5),
v6 = rep(0, 5),
v7 = rep(0, 5),
v8 = rep(0, 5),
v9 = rep(0, 5),
v10 = rep(0, 5))
This is our table. Note that out of 5 observations we have 2 people (40%) who are members of at least 2 groups
> df
# A tibble: 5 x 11
id v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
<int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 1 1 0 0 0 0 0 0 0 0
2 2 1 1 0 0 0 0 0 0 0 0
3 3 0 0 0 0 0 0 0 0 0 0
4 4 0 0 0 0 0 0 0 0 0 0
5 5 0 0 0 0 0 0 0 0 0 0
First, I calculate the sum of the variables 1 to 10, creating a variable that gets true if greater than or equal to 2 and false otherwise. Then we group by this new variable and calculate the percentages
result <- df %>%
rowwise() %>%
mutate(two_or_more = sum(c_across(v1:v10)) >= 2) %>%
group_by(two_or_more) %>%
summarize(percentage = sum(n()) / nrow(df) * 100)
The result should look like this
> result
# A tibble: 2 x 2
two_or_more percentage
<lgl> <dbl>
1 FALSE 60
2 TRUE 40
Upvotes: 0
Reputation: 496
You can create a new column, where you add up all 1's and 0' then sum up the values that are greater or smaller than 2.
dat <- matrix(ifelse(runif(100)>=0.1,0,1),10,10) %>%
as_tibble(,.name_repair = "unique")
dat %>%
mutate(rsum = rowSums(.)) %>%
summarise(fewer_than_two = 100*sum(rsum<2)/n(),
more_than_two = 100*sum(rsum>=2)/n())
# A tibble: 1 x 2
fewer_than_two more_than_two
<dbl> <dbl>
1 80 20
Upvotes: 0
Reputation: 11548
Does this work:
> library(dplyr)
> stat <- data.frame(v1 = sample(c(0,1), 10, T),
+ v2 = sample(c(0,1), 10, T),
+ v3 = sample(c(0,1), 10, T),
+ v4 = sample(c(0,1), 10, T),
+ v5 = sample(c(0,1), 10, T),
+ v6 = sample(c(0,1), 10, T),
+ v7 = sample(c(0,1), 10, T),
+ v8 = sample(c(0,1), 10, T),
+ v9 = sample(c(0,1), 10, T),
+ v10 = sample(c(0,1), 10, T), stringsAsFactors = F)
> stat
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
1 0 1 1 1 1 1 1 0 0 1
2 0 1 1 0 0 1 1 1 0 1
3 0 1 1 0 1 0 0 1 1 0
4 0 0 1 1 0 1 0 1 0 0
5 0 0 1 1 0 1 0 1 1 0
6 0 1 0 1 1 1 1 1 1 0
7 0 0 1 0 0 0 0 1 0 1
8 0 0 1 1 1 1 0 0 0 1
9 0 1 0 0 0 1 0 0 0 1
10 0 1 1 0 0 0 0 0 1 1
> stat %>% mutate(groups_member = rowSums(.)) %>% mutate(atleast_two_groups = case_when(groups_member >= 2 ~ 'Yes', TRUE ~ 'No')) %>% select(-groups_member)
v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 atleast_two_groups
1 0 1 1 1 1 1 1 0 0 1 Yes
2 0 1 1 0 0 1 1 1 0 1 Yes
3 0 1 1 0 1 0 0 1 1 0 Yes
4 0 0 1 1 0 1 0 1 0 0 Yes
5 0 0 1 1 0 1 0 1 1 0 Yes
6 0 1 0 1 1 1 1 1 1 0 Yes
7 0 0 1 0 0 0 0 1 0 1 Yes
8 0 0 1 1 1 1 0 0 0 1 Yes
9 0 1 0 0 0 1 0 0 0 1 Yes
10 0 1 1 0 0 0 0 0 1 1 Yes
So the dataframe is like a matrix with 10 variables each having either 0 or 1. So creating a new column that sums up all rows and if the total count is more than 2 which is more than atleast 20% (2/10) then it tells whether it satisfies your query.
Upvotes: 1