Randy
Randy

Reputation: 63

compare columns of groups dataframe for equality

My goal her is to compare a string or numeric among a group grouped by ID. So if for example var1 both were "NORMAL" a new column would say TRUE or FALSE. I know I can summarise_all() but I need it to be new column for another project. Also I would like this comparison to work for a numeric as well. All have to be exactly the same in the column of choice. Some of the groups have more than 2 members.

df <- structure(list(ID = c("A1.1234567", "A1.12345"), 
                 var1 = c("NORMAL", "NORMAL"), 
                 var2 = c("NORMAL", "NORMAL"), 
                 var3 = c("NORMAL", "NORMAL"), 
                 var4 = c("NORMAL", "NORMAL"), 
                 var5 = c("NORMAL", "NORMAL"), 
                 var6 = c("NORMAL", "NORMAL"), 
                 var7 = c("25", "25"), 
                 var8 = c("6, 9)),

            .Names = c("ID", "var1", "var2", "var3", "var4", "var5", "var6", "var7", "var8"), 
            class = "data.frame", row.names = c(NA, -2L))

I want it to look like

         ID   var1   var2   var3   var4   var5   var6 var7 var8 var7.true va8.true
A1.1234567 NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL  25    6    TRUE   FALSE
A1.1234567 NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL  25    9    TRUE   FALSE

My only idea was to mutate it but I cant seem to compare them correctly

Upvotes: 0

Views: 53

Answers (1)

Sotos
Sotos

Reputation: 51582

You can use mutate_at(as opposed to mutate_all) in order to not include ID since we are not grouping by it, and define the name of the new variables to be created so that it does not overwrite the existing ones, i.e.

df %>% 
 mutate_at(vars(-ID), funs(new = ifelse(all(. == 'NORMAL'), TRUE, FALSE)))

which gives

             ID   var1   var2   var3   var4   var5   var6     var7   var8 var1_new var2_new var3_new var4_new var5_new var6_new var7_new var8_new
1 A1.1234567_10 NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL   NORMAL NORMAL     TRUE     TRUE     TRUE     TRUE     TRUE     TRUE    FALSE     TRUE
2 A1.1234567_20 NORMAL NORMAL NORMAL NORMAL NORMAL NORMAL ABNORMAL NORMAL     TRUE     TRUE     TRUE     TRUE     TRUE     TRUE    FALSE     TRUE

EDIT As per your comment, there are a few ways to get equality in all elements. I went with the length of the unique value being 1 (If all are the same), i.e.

mutate_at(df, vars(-ID), funs(new = length(unique(.)) == 1))

BONUS Now you don't need to use ifelse since we are not defining a value

Upvotes: 3

Related Questions