Reputation: 77
I'm new to R, I have a very long data set with presumably some repeated values (dates) in different variables, I want to assess whether two or more variables (if possible) are equal or not for each individual.
My data are something like this:
Id date1 date2 date3 date25
1 17/10/2002 17/10/2002 25/01/2008 25/01/2008
2 13/04/2009 13/04/2009
3 07/02/2008
4 24/11/2006 09/06/2010 09/06/2010
I would like to identify for each individual which variables are equal and which variables are not. I've tried with identical(), all() and others, but since my dataset has more than 20k observations, it is difficult to use it.
At the moment duplicated seems to work, but it is not quite what I'm looking for, perhaps I'm doing something wrong, this is what I've tried:
mutate(mydata, newv=duplicated(mydata))
mydata is a subset of the dataframe that only contains the ID and all the dates variables. This adds one column at the end and the values are all FALSE, but I know there are some values that are equal (not in all the variables, though), I assume it may be related to the missing values in the variables.
My desired output would be something like this:
Id date1 date2 date3 date25
1 17/10/2002 25/01/2008
2 13/04/2009
3 07/02/2008
4 24/11/2006 09/06/2010
Does anyone have any suggestions at all?
Thanks!!
Upvotes: 1
Views: 253
Reputation: 92302
It seems like a job for apply
. Here's a possible solution
mydata2 <- as.data.frame(t(apply(mydata, 1, function(x){temp <- unique(x);
c(temp, rep("", length(x) - length(temp)))})))
names(mydata2) <- names(mydata)
mydata2
# Id date1 date2 date3 date25
# 1 1 17/10/2002 25/01/2008
# 2 2 13/04/2009
# 3 3 07/02/2008
# 4 4 24/11/2006 09/06/2010
Upvotes: 2