Heiko
Heiko

Reputation: 3

How can I exclude users in a survey with the same ID for calculations in R?

My data.frame includes the results from a survey and looks like this:

date id age gender ...
01-02 99 20 1 ...
01-20 52 34 2 ...
01-23 47 20 1 ...
01-02 100 56 1 ...
02-05 99 20 1 ...
02-17 78 18 2 ...
02-28 47 20 1 ...

the users are allowed to attend each month, up to 10 times at the survey, so I have users who's personal data occurs more often in the table.

Now to my problem: How can I get the mean (e.g. age) of all users who attended the survey? If I just put it mean(df$age), obviously those who did attend more than once will be overrepresented.

How can I get a list with counting users who attended once, twice, ... ten times? e.g.:

number of participations number of users
1 2,047
2 23,127
3 50,000

I haven't found a solution for this, so I'm grateful for any help. Thanks in advance!

Upvotes: 0

Views: 48

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 388907

To get average age of the participants you can keep only the unique id's of the data and calculate the average.

In dplyr you can do this with distinct and summarise.

library(dplyr)

df %>%
  distinct(id, .keep_all = TRUE) %>%
  summarise(avg_age = mean(age))

#  avg_age
#1    29.6

To count how many times an individual responded to the survey you can use count

df %>% count(id, name = 'count')

#   id count
#1  47     2
#2  52     1
#3  78     1
#4  99     2
#5 100     1

data

It is easier to help if you provide data in a reproducible format

df <- structure(list(date = c("01-02", "01-20", "01-23", "01-02", "02-05", 
"02-17", "02-28"), id = c(99L, 52L, 47L, 100L, 99L, 78L, 47L), 
    age = c(20L, 34L, 20L, 56L, 20L, 18L, 20L), gender = c(1L, 
    2L, 1L, 1L, 1L, 2L, 1L)), row.names = c(NA, -7L), class = "data.frame")

Upvotes: 1

Related Questions