S. Kelly
S. Kelly

Reputation: 155

Finding the common dates in a data set based on a categorical variable

I have a data set that looks something like this:

id    eye    date       notes
1      L   01-01-2000
1      L   01-06-2000
1      R   01-01-2000
1      R   01-03-2000
1      R   01-06-2000
2      L   01-01-2000
2      L   01-04-2000
2      R   01-04-2000
2      R   01-07-2000
2      R   04-09-2001
3      L   01-01-2000
4      L   01-01-2000
4      L   03-03-2001
4      R   03-03-2001

For each id I want to be able to find the common dates with eyes marked as L and R and then output it as a data frame. So it should look like:

id    eye    date       notes
1      L   01-01-2000
1      L   01-06-2000
1      R   01-01-2000
1      R   01-06-2000
2      L   01-04-2000
2      R   01-04-2000
4      L   03-03-2001
4      R   03-03-2001

I've been trying different variations in dplyr and plyr (examples below) but I can't seem to get it to do what i want.

data %>% group_by(id) %>% do(Reduce(intersect, list(.$date[.$eye == "L"], .$date[.$eye == "R"])))

ddply(data, .(id), summarize, Reduce(intersect, list(.$date[.$eye == "L"], .$date[.$eye == "R"])))

Semantically both of these seem to "say" what I want them to do but I don't get any legible results.

Any ideas?

Upvotes: 1

Views: 187

Answers (4)

r.user.05apr
r.user.05apr

Reputation: 5456

Another one:

df <- read.table(header=TRUE, text="id    eye    date
1      L   01-01-2000
             1      L   01-06-2000
             1      R   01-01-2000
             1      R   01-03-2000
             1      R   01-06-2000
             2      L   01-01-2000
             2      L   01-04-2000
             2      R   01-04-2000
             2      R   01-07-2000
             2      R   04-09-2001
             3      L   01-01-2000
             4      L   01-01-2000
             4      L   03-03-2001
             4      R   03-03-2001")

library(dplyr)

left_join(df %>%
             group_by(id,date) %>%
             summarize(n=n()) %>%
             filter(n==2) %>%
             select(-n),
           df,
           by=c("id","date")) %>%
  arrange(id,eye,date)

Upvotes: 1

lmo
lmo

Reputation: 38510

If there is always an L and an R when there are two observations for an id-date, as is the case for your example data, then you can use duplicated like this:

dat[duplicated(dat[-2]) | duplicated(dat[-2], fromLast = TRUE),]
   id eye       date
1   1   L 01-01-2000
2   1   L 01-06-2000
3   1   R 01-01-2000
5   1   R 01-06-2000
7   2   L 01-04-2000
8   2   R 01-04-2000
13  4   L 03-03-2001
14  4   R 03-03-2001

If you need to check that both L and R are present in id-date pairs, you could save the above result and use ave to perform a check like this

temp <- dat[duplicated(dat[-2]) | duplicated(dat[-2], fromLast = TRUE),]
temp[as.logical(ave(as.character(temp$eye), temp$id, temp$date,
                    FUN=function(x) all(x %in% c("L", "R")))),]
   id eye       date
1   1   L 01-01-2000
2   1   L 01-06-2000
3   1   R 01-01-2000
5   1   R 01-06-2000
7   2   L 01-04-2000
8   2   R 01-04-2000
13  4   L 03-03-2001
14  4   R 03-03-2001

Upvotes: 3

akrun
akrun

Reputation: 887501

We can also use data.table

library(data.table)
setDT(df1)[,  .SD[(uniqueN(eye)>1) & !duplicated(eye)], date]
#         date id eye
#1: 01-01-2000  1   L
#2: 01-01-2000  1   R
#3: 01-06-2000  1   L
#4: 01-06-2000  1   R
#5: 01-04-2000  2   L
#6: 01-04-2000  2   R

Upvotes: 3

erc
erc

Reputation: 10131

With dplyr you can use filter:

dat %>%
  group_by(date) %>%
  filter(sum(eye == "L") > 0 & sum(eye == "R") > 0 & !(duplicated(date) & duplicated(eye)))

Source: local data frame [6 x 3]
Groups: date [3]

     id    eye       date
  <int> <fctr>     <fctr>
1     1      L 01-01-2000
2     1      L 01-06-2000
3     1      R 01-01-2000
4     1      R 01-06-2000
5     2      L 01-04-2000
6     2      R 01-04-2000

Upvotes: 2

Related Questions