user14478763
user14478763

Reputation:

How to find percentage of people belonging to atleast 2 groups in r

I just started using R. And I have a stata dataset which I opened in R. In the questionnaire there is a question “Please look carefully at the following list of political groups and say which, if any, do you belong to?” . Variable v1 to v10 represents the different groups and each have values of 1 or 0 which is ‘yes’ or ‘no’. My question is: How do I find the percentage of people who are members of atleast 2 groups?

I think I’m supposed to use dplyr but I am not sure.

One of the idea that I've got was to use filter and mutate.

Upvotes: 0

Views: 230

Answers (3)

ssiani
ssiani

Reputation: 1

Start creating some fake data

library(dplyr)

df <- tibble(id = 1:5,
             v1 = c(1, 1, 0, 0, 0),
             v2 = c(1, 1, 0, 0, 0),
             v3 = rep(0, 5),
             v4 = rep(0, 5),
             v5 = rep(0, 5),
             v6 = rep(0, 5),
             v7 = rep(0, 5),
             v8 = rep(0, 5),
             v9 = rep(0, 5),
             v10 = rep(0, 5))

This is our table. Note that out of 5 observations we have 2 people (40%) who are members of at least 2 groups

> df
  # A tibble: 5 x 11
  id    v1    v2    v3    v4    v5    v6    v7    v8    v9   v10
  <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1     1     1     0     0     0     0     0     0     0     0
2     2     1     1     0     0     0     0     0     0     0     0
3     3     0     0     0     0     0     0     0     0     0     0
4     4     0     0     0     0     0     0     0     0     0     0
5     5     0     0     0     0     0     0     0     0     0     0

First, I calculate the sum of the variables 1 to 10, creating a variable that gets true if greater than or equal to 2 and false otherwise. Then we group by this new variable and calculate the percentages

result <- df %>%
   rowwise() %>%
   mutate(two_or_more = sum(c_across(v1:v10)) >= 2) %>%
   group_by(two_or_more) %>%
   summarize(percentage = sum(n()) / nrow(df) * 100)

The result should look like this

> result
# A tibble: 2 x 2
  two_or_more percentage
  <lgl>            <dbl>
1 FALSE               60
2 TRUE                40

Upvotes: 0

SebSta
SebSta

Reputation: 496

You can create a new column, where you add up all 1's and 0' then sum up the values that are greater or smaller than 2.

set.seed(1234)
dat <- matrix(ifelse(runif(100)>=0.1,0,1),10,10) %>% 
  as_tibble(,.name_repair = "unique")


dat %>% 
  mutate(rsum = rowSums(.)) %>% 
  summarise(fewer_than_two = 100*sum(rsum<2)/n(),
            more_than_two = 100*sum(rsum>=2)/n())

# A tibble: 1 x 2
  fewer_than_two more_than_two
           <dbl>         <dbl>
1             80            20

Upvotes: 0

Karthik S
Karthik S

Reputation: 11548

Does this work:

> library(dplyr)
> stat <- data.frame(v1 = sample(c(0,1), 10, T),
+                    v2 = sample(c(0,1), 10, T),
+                    v3 = sample(c(0,1), 10, T),
+                    v4 = sample(c(0,1), 10, T),
+                    v5 = sample(c(0,1), 10, T),
+                    v6 = sample(c(0,1), 10, T),
+                    v7 = sample(c(0,1), 10, T),
+                    v8 = sample(c(0,1), 10, T),
+                    v9 = sample(c(0,1), 10, T),
+                    v10 = sample(c(0,1), 10, T), stringsAsFactors = F)
> stat
   v1 v2 v3 v4 v5 v6 v7 v8 v9 v10
1   0  1  1  1  1  1  1  0  0   1
2   0  1  1  0  0  1  1  1  0   1
3   0  1  1  0  1  0  0  1  1   0
4   0  0  1  1  0  1  0  1  0   0
5   0  0  1  1  0  1  0  1  1   0
6   0  1  0  1  1  1  1  1  1   0
7   0  0  1  0  0  0  0  1  0   1
8   0  0  1  1  1  1  0  0  0   1
9   0  1  0  0  0  1  0  0  0   1
10  0  1  1  0  0  0  0  0  1   1
> stat %>% mutate(groups_member = rowSums(.)) %>% mutate(atleast_two_groups = case_when(groups_member >= 2 ~ 'Yes', TRUE ~ 'No')) %>% select(-groups_member)
   v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 atleast_two_groups
1   0  1  1  1  1  1  1  0  0   1                Yes
2   0  1  1  0  0  1  1  1  0   1                Yes
3   0  1  1  0  1  0  0  1  1   0                Yes
4   0  0  1  1  0  1  0  1  0   0                Yes
5   0  0  1  1  0  1  0  1  1   0                Yes
6   0  1  0  1  1  1  1  1  1   0                Yes
7   0  0  1  0  0  0  0  1  0   1                Yes
8   0  0  1  1  1  1  0  0  0   1                Yes
9   0  1  0  0  0  1  0  0  0   1                Yes
10  0  1  1  0  0  0  0  0  1   1                Yes
> 

So the dataframe is like a matrix with 10 variables each having either 0 or 1. So creating a new column that sums up all rows and if the total count is more than 2 which is more than atleast 20% (2/10) then it tells whether it satisfies your query.

Upvotes: 1

Related Questions