Reputation: 183
I am using r to analyse some data that is in long format. I have one column that is a grouping variable which contains participant IDs and another variable that contains their sex.
e.g.
ID SEX
1 M
1 M
2 F
2 F
2 M
I would like to check whether there are any IDs which do not have sex coded consistently e.g. ID=2 above. Is there a way to do this? I have been playing around with dplyr and the group_by function, but I am at a loss. Any help would be greatly appreciated.
In terms of output, I would probably like a vector of all unique ID values that have non-identical values in the SEX column.
Upvotes: 0
Views: 60
Reputation: 11150
Here's a base R soultion using ave()
-
df[ave(df$SEX, df$ID, FUN = function(x) length(unique(x))) > 1, ]
ID SEX
3 2 F
4 2 F
5 2 M
Upvotes: 1
Reputation: 13403
You can try this.
require(plyr)
df <- data.frame(c(1,1,2,2,2), c('M','M','F','F','M'))
names(df) <- c('ID','SEX')
df2 <- ddply(df,.(ID), mutate, count = length(unique(SEX)))
unique(df2[df2$count > 1,][1])
Result:
ID
2
Upvotes: 0