Reputation: 781
I have data set, I want to create a variable check
to check there is any row of variable day
in group ID
different.
df <- data.frame(ID=c("id1", "id1","id2", "id2","id3","id3","id3"),
day=c("01/02/2008","01/02/2008","10/02/2009","08/03/2009","11/08/2007","11/08/2007","11/08/2008"),
it =c("ul","tr","cb","ul","ul","tc","tr"))
df$day <- as.Date(as.character(df$day), format = "%m/%d/%Y")
ID day it
1 id1 2008-01-02 ul
2 id1 2008-01-02 tr
3 id2 2009-10-02 cb
4 id2 2009-08-03 ul
5 id3 2007-11-08 ul
6 id3 2007-11-08 tc
7 id3 2008-11-08 tr
The problem when I use this code, id3
listed in duplicated, because it have 2 rows the same, but I want all of row need to be the same, if not I don't count.
c <- df[duplicated(df$ID) & duplicated(df$day),]
df1 <- df[df$ID %in% c$ID,]
ID day it
1 id1 2008-01-02 ul
2 id1 2008-01-02 tr
5 id3 2007-11-08 ul
6 id3 2007-11-08 tc
7 id3 2008-11-08 tr
I want the output like this
ID day it check
1 id1 2008-01-02 ul Yes
2 id1 2008-01-02 tr Yes
3 id2 2009-10-02 cb No
4 id2 2009-08-03 ul No
5 id3 2007-11-08 ul No
6 id3 2007-11-08 tc No
7 id3 2008-11-08 tr No
Upvotes: 3
Views: 2296
Reputation: 18681
Here's a solution using dplyr
:
library(dplyr)
dupsID = df %>%
group_by(ID, day) %>%
distinct(.keep_all = TRUE) %>%
group_by(ID) %>%
filter(n() == 1)
First group by ID
then by day
, extract the distinct elements. Note that "id3 2007-11-08" and "id3 2008-11-08" would be counted as distinct since I grouped by both ID
and day
. .keep_all = TRUE
means that I want to keep all the columns. The second group_by(ID)
groups the result from distinct
only by ID
. I then extracted samples with only one row for each group. This results in:
# > dupsID
# Source: local data frame [1 x 3]
# Groups: ID [1]
#
# ID day it
# <fctr> <date> <fctr>
# 1 id1 2008-01-02 ul
Finally, I check which ID
in df
are in ID
column of dupsID
df$check = df$ID %in% dupsID$ID
# > df
# ID day it check
# 1 id1 2008-01-02 ul TRUE
# 2 id1 2008-01-02 tr TRUE
# 3 id2 2009-10-02 cb FALSE
# 4 id2 2009-08-03 ul FALSE
# 5 id3 2007-11-08 ul FALSE
# 6 id3 2007-11-08 tc FALSE
# 7 id3 2008-11-08 tr FALSE
Upvotes: 3
Reputation: 32548
First, split df
by ID
. Then, for each subgroup, check if ID
and day
are same (performed by pasting them together in this example). This will give a logical for each ID
. Then you can store that in a variable (temp
in this example) and lookup from that.
temp = sapply(split(df, df$ID), function(a)
length(unique(paste(a$ID, a$day))) == 1 & nrow(a) > 1)
temp[match(df$ID, names(temp))]
#id1 id1 id2 id2 id3 id3 id3
# TRUE TRUE FALSE FALSE FALSE FALSE FALSE
Upvotes: 2
Reputation: 12713
within(df, # attach the columns of df in a separate environment using within()
check <- unlist( by(df, # group df by ID using by()
INDICES = ID, # check for unique of days with length = 1, if so return true else false
FUN = function(x) rep( length( unique( x$day ) ) == 1, length(x$day) ) )
))
# ID day it check
# 1 id1 01/02/2008 ul TRUE
# 2 id1 01/02/2008 tr TRUE
# 3 id2 10/02/2009 cb FALSE
# 4 id2 08/03/2009 ul FALSE
# 5 id3 11/08/2007 ul FALSE
# 6 id3 11/08/2007 tc FALSE
# 7 id3 11/08/2008 tr FALSE
Data:
df <- data.frame(ID=c("id1", "id1","id2", "id2","id3","id3","id3"),
day=c("01/02/2008","01/02/2008","10/02/2009","08/03/2009","11/08/2007","11/08/2007","11/08/2008"),
it =c("ul","tr","cb","ul","ul","tc","tr"))
Upvotes: 2