bmora
bmora

Reputation: 77

How to identify repeated variables within observations?

I'm new to R, I have a very long data set with presumably some repeated values (dates) in different variables, I want to assess whether two or more variables (if possible) are equal or not for each individual.

My data are something like this:

    Id         date1        date2        date3      date25

     1       17/10/2002   17/10/2002  25/01/2008  25/01/2008
     2       13/04/2009   13/04/2009                        
     3       07/02/2008   
     4       24/11/2006   09/06/2010  09/06/2010

I would like to identify for each individual which variables are equal and which variables are not. I've tried with identical(), all() and others, but since my dataset has more than 20k observations, it is difficult to use it.

At the moment duplicated seems to work, but it is not quite what I'm looking for, perhaps I'm doing something wrong, this is what I've tried:

    mutate(mydata, newv=duplicated(mydata))

mydata is a subset of the dataframe that only contains the ID and all the dates variables. This adds one column at the end and the values are all FALSE, but I know there are some values that are equal (not in all the variables, though), I assume it may be related to the missing values in the variables.

My desired output would be something like this:

    Id         date1        date2        date3      date25

     1       17/10/2002   25/01/2008  
     2       13/04/2009                         
     3       07/02/2008   
     4       24/11/2006   09/06/2010  

Does anyone have any suggestions at all?

Thanks!!

Upvotes: 1

Views: 253

Answers (1)

David Arenburg
David Arenburg

Reputation: 92302

It seems like a job for apply. Here's a possible solution

mydata2 <- as.data.frame(t(apply(mydata, 1, function(x){temp <- unique(x); 
                                             c(temp, rep("", length(x) - length(temp)))})))
names(mydata2) <- names(mydata)
mydata2
#   Id      date1      date2 date3 date25
# 1  1 17/10/2002 25/01/2008             
# 2  2 13/04/2009                        
# 3  3 07/02/2008                        
# 4  4 24/11/2006 09/06/2010    

Upvotes: 2

Related Questions