Reputation: 155
I have a data set that looks something like this:
id eye date notes
1 L 01-01-2000
1 L 01-06-2000
1 R 01-01-2000
1 R 01-03-2000
1 R 01-06-2000
2 L 01-01-2000
2 L 01-04-2000
2 R 01-04-2000
2 R 01-07-2000
2 R 04-09-2001
3 L 01-01-2000
4 L 01-01-2000
4 L 03-03-2001
4 R 03-03-2001
For each id
I want to be able to find the common dates with eyes marked as L and R and then output it as a data frame. So it should look like:
id eye date notes
1 L 01-01-2000
1 L 01-06-2000
1 R 01-01-2000
1 R 01-06-2000
2 L 01-04-2000
2 R 01-04-2000
4 L 03-03-2001
4 R 03-03-2001
I've been trying different variations in dplyr
and plyr
(examples below) but I can't seem to get it to do what i want.
data %>% group_by(id) %>% do(Reduce(intersect, list(.$date[.$eye == "L"], .$date[.$eye == "R"])))
ddply(data, .(id), summarize, Reduce(intersect, list(.$date[.$eye == "L"], .$date[.$eye == "R"])))
Semantically both of these seem to "say" what I want them to do but I don't get any legible results.
Any ideas?
Upvotes: 1
Views: 187
Reputation: 5456
Another one:
df <- read.table(header=TRUE, text="id eye date
1 L 01-01-2000
1 L 01-06-2000
1 R 01-01-2000
1 R 01-03-2000
1 R 01-06-2000
2 L 01-01-2000
2 L 01-04-2000
2 R 01-04-2000
2 R 01-07-2000
2 R 04-09-2001
3 L 01-01-2000
4 L 01-01-2000
4 L 03-03-2001
4 R 03-03-2001")
library(dplyr)
left_join(df %>%
group_by(id,date) %>%
summarize(n=n()) %>%
filter(n==2) %>%
select(-n),
df,
by=c("id","date")) %>%
arrange(id,eye,date)
Upvotes: 1
Reputation: 38510
If there is always an L and an R when there are two observations for an id-date, as is the case for your example data, then you can use duplicated
like this:
dat[duplicated(dat[-2]) | duplicated(dat[-2], fromLast = TRUE),]
id eye date
1 1 L 01-01-2000
2 1 L 01-06-2000
3 1 R 01-01-2000
5 1 R 01-06-2000
7 2 L 01-04-2000
8 2 R 01-04-2000
13 4 L 03-03-2001
14 4 R 03-03-2001
If you need to check that both L and R are present in id-date pairs, you could save the above result and use ave
to perform a check like this
temp <- dat[duplicated(dat[-2]) | duplicated(dat[-2], fromLast = TRUE),]
temp[as.logical(ave(as.character(temp$eye), temp$id, temp$date,
FUN=function(x) all(x %in% c("L", "R")))),]
id eye date
1 1 L 01-01-2000
2 1 L 01-06-2000
3 1 R 01-01-2000
5 1 R 01-06-2000
7 2 L 01-04-2000
8 2 R 01-04-2000
13 4 L 03-03-2001
14 4 R 03-03-2001
Upvotes: 3
Reputation: 887501
We can also use data.table
library(data.table)
setDT(df1)[, .SD[(uniqueN(eye)>1) & !duplicated(eye)], date]
# date id eye
#1: 01-01-2000 1 L
#2: 01-01-2000 1 R
#3: 01-06-2000 1 L
#4: 01-06-2000 1 R
#5: 01-04-2000 2 L
#6: 01-04-2000 2 R
Upvotes: 3
Reputation: 10131
With dplyr you can use filter
:
dat %>%
group_by(date) %>%
filter(sum(eye == "L") > 0 & sum(eye == "R") > 0 & !(duplicated(date) & duplicated(eye)))
Source: local data frame [6 x 3]
Groups: date [3]
id eye date
<int> <fctr> <fctr>
1 1 L 01-01-2000
2 1 L 01-06-2000
3 1 R 01-01-2000
4 1 R 01-06-2000
5 2 L 01-04-2000
6 2 R 01-04-2000
Upvotes: 2