Reputation: 3
My data.frame includes the results from a survey and looks like this:
date | id | age | gender | ... |
---|---|---|---|---|
01-02 | 99 | 20 | 1 | ... |
01-20 | 52 | 34 | 2 | ... |
01-23 | 47 | 20 | 1 | ... |
01-02 | 100 | 56 | 1 | ... |
02-05 | 99 | 20 | 1 | ... |
02-17 | 78 | 18 | 2 | ... |
02-28 | 47 | 20 | 1 | ... |
the users are allowed to attend each month, up to 10 times at the survey, so I have users who's personal data occurs more often in the table.
Now to my problem: How can I get the mean (e.g. age) of all users who attended the survey? If I just put it mean(df$age), obviously those who did attend more than once will be overrepresented.
How can I get a list with counting users who attended once, twice, ... ten times? e.g.:
number of participations | number of users |
---|---|
1 | 2,047 |
2 | 23,127 |
3 | 50,000 |
I haven't found a solution for this, so I'm grateful for any help. Thanks in advance!
Upvotes: 0
Views: 48
Reputation: 388907
To get average age of the participants you can keep only the unique id
's of the data and calculate the average.
In dplyr
you can do this with distinct
and summarise
.
library(dplyr)
df %>%
distinct(id, .keep_all = TRUE) %>%
summarise(avg_age = mean(age))
# avg_age
#1 29.6
To count how many times an individual responded to the survey you can use count
df %>% count(id, name = 'count')
# id count
#1 47 2
#2 52 1
#3 78 1
#4 99 2
#5 100 1
data
It is easier to help if you provide data in a reproducible format
df <- structure(list(date = c("01-02", "01-20", "01-23", "01-02", "02-05",
"02-17", "02-28"), id = c(99L, 52L, 47L, 100L, 99L, 78L, 47L),
age = c(20L, 34L, 20L, 56L, 20L, 18L, 20L), gender = c(1L,
2L, 1L, 1L, 1L, 2L, 1L)), row.names = c(NA, -7L), class = "data.frame")
Upvotes: 1