Subha Yoganandan
Subha Yoganandan

Reputation: 355

Which field is different using dplyr?

I have the following data

df <- data.frame(val1=c(1.2,0.5,3.8,2.5,7.4),
                 val2=c(1.2,2.5,3.8,2.5,2.4),
                 val3=c(1.2,2.5,3.6,2.5,7.4),
                 val4=c(1.2,2.5,3.8,2.5,4.4),
                 val5=c(1.2,2.5,3.8,2.9,7.4))

I'd like to find which field is different to the rest. Expecting a result like below to be added to the data frame

cbind(df,results = c("all_equal", "val1","val3","val5","morethan1"))

Is there any way to do this in an easy way? I have an extensive loop to get this result which I won't post here. I'm looking for a quick solution that I've missed to see (maybe using dplyr)

Upvotes: 1

Views: 66

Answers (2)

akrun
akrun

Reputation: 887098

Here is one option with tidyverse by reshaping the data into 'long' format, apply the conditions to create the 'result' column and then bind the column with the original dataset

library(tidyverse)
rownames_to_column(df, 'rn') %>% 
   gather(key, val, matches('^val')) %>% 
   group_by(rn) %>%
   mutate(Mode = Modes(val)) %>% 
   summarise(result = case_when(all(val == Mode) ~ "all_equal",
                           sum(val != Mode) > 1 ~ "morethan1",
                       TRUE ~ paste0("val", which(val != Mode), collapse=","))) %>% 
  select(result) %>% 
  bind_cols(df, .)
#  val1 val2 val3 val4 val5    result
#1  1.2  1.2  1.2  1.2  1.2 all_equal
#2  0.5  2.5  2.5  2.5  2.5      val1
#3  3.8  3.8  3.6  3.8  3.8      val3
#4  2.5  2.5  2.5  2.5  2.9      val5
#5  7.4  2.4  7.4  4.4  7.4 morethan1

The Modes function

Modes <- function(x) {
  ux <- unique(x)
  tab <- tabulate(match(x, ux))
  ux[tab == max(tab)]
 }

Upvotes: 1

www
www

Reputation: 39154

First, define a function to calculate the mode. I used the function found here: https://stackoverflow.com/a/8189441/7669809

Modes <- function(x) {
  ux <- unique(x)
  tab <- tabulate(match(x, ux))
  ux[tab == max(tab)]
}

After that, we can use the following code to get the desired output.

apply(df, 1, function(x){
  x_mode <- Modes(x)
  if (all(x == x_mode)){
    return("all_equal")
  } else if (sum(x != x_mode) > 1){
    return("morethan1")
  } else {
    ind <- which(x != x_mode)
    return(paste0("val", ind))
  }
})
# [1] "all_equal" "val1"      "val3"      "val5"      "morethan1"

Upvotes: 1

Related Questions