Reputation: 351

How to find if we have two or more equal minimum values in specific columns

I have this problem I was hoping someone could help.

I have a very large data frame (close to 20000000 observations)in R with about 43 columns, in four of those columns I need to find whether there is more than one equal minimum value below 200, then if we have rows where more than one column has the same value that meets this criteria I need to flag that row to TRUE (in a a new flag column). Please note that those columns include NA values, and NAs should not be used (when NA is present in the columns being compared, returns NA)

the goal is to look up the values in each row for columns a1 to a4 and find whether the minimum value that does not exceed 200, occurs in more than one column per row

for simplicity let's say that this is how my data data look like

head(mydata)
t1  a1  a2  a3  a4 
34  NA  NA  NA  NA
26  10  15  250 150
34  20  20  100 30 
35  5   5   10  5  
25  45  100 3   45
31 400 310 500 310 
")

the goal is to look up the values in each row for columns a1 to a4 and find whether the minimum value that does not exceed 200, occurs in more than one column per row, if it does return true if not, false

the expected result will look like this

head(mydata)
t1  a1  a2  a3  a4  flag
34  NA  NA  NA  NA  NA
26  10  15  250 150 FALSE
34  20  20  100 30  TRUE
35  5   5   10  5   TRUE
25  45  100 3   45  FALSE
31 400 310 500 310  FALSE
")

Thank you in advance.

Upvotes: 2

Answers (5)

Rui Barradas

Reputation: 76673

If you have a large dataset, the following might be fast. It uses package matrixStats, function rowMins. See this answer.

icol <- grepl("^a", names(mydata))
min_row <- matrixStats::rowMins(as.matrix(mydata[icol]))

mydata$flag <- rowSums(mydata[icol] == min_row) > 1 & min_row < 200

Upvotes: 1

tmfmnk

Reputation: 40171

One possibility involving dplyr and purrr:

df %>%
 mutate(flag = exec(pmin, !!!.[-1]),
        flag = rowSums(.[-1] == flag) > 1 & flag < 200)

  t1  a1  a2  a3  a4  flag
1 34  NA  NA  NA  NA    NA
2 26  10  15 250 150 FALSE
3 34  20  20 100  30  TRUE
4 35   5   5  10   5  TRUE
5 25  45 100   3  45 FALSE
6 31 400 310 500 310 FALSE

Here it checks whether the occurrence of row-wise minimum is greater than 1 and whether the row-wise minimum is below 200.

Upvotes: 0

Dan

Reputation: 12084

Here's a purrr solution. I create the data frame.

# Define data frame
df <- read.table(text = " t1  a1  a2  a3  a4 
                  34  NA  NA  NA  NA
                  26  10  15  250 150
                  34  20  20  100 30 
                  35  5   5   10  5  
                  25  45  100 3   45
                  31 400 310 500 310 ", header = TRUE)

Next, I load the library.

# Load library
library(purrr)

Then, I create the flag, running through each row using pmap_lgl, which returns a logical. This line checks if there is more than one minimum value and that the minimum is below 200. The first column is omitted from the calculation. If there are NA values in each row, an NA will be created.

# Create flag
df$flag <- pmap_lgl(df, function(...)(sum(c(...)[-1] == min(c(...)[-1])) > 1) & min(c(...)[-1]) < 200)

This gives the following:

# Examine result
df
#>   t1  a1  a2  a3  a4  flag
#> 1 34  NA  NA  NA  NA    NA
#> 2 26  10  15 250 150 FALSE
#> 3 34  20  20 100  30  TRUE
#> 4 35   5   5  10   5  TRUE
#> 5 25  45 100   3  45 FALSE
#> 6 31 400 310 500 310 FALSE

^{Created on 2019-05-31 by the reprex package (v0.3.0)}

Upvotes: 0

which_command

Reputation: 511

Does this help you?:

mydata$flag=apply(mydata,1,function(x){  # iterate through rows
    x=na.omit(x);        # omit NAs in a row (optional)
    tab=table(x[x<200]); # count numbers of all row values below 200
    if(any(tab>1)){      # check if any values are not unique
          return(TRUE)
          }else{
          return(FALSE)
         }})

You can choose to include or omit NA values or not.

Upvotes: 2

Ronak Shah

Reputation: 389325

Here is a base R way to do it

#Get the column indices where a1, a2, a3 and a4 are there
inds <- match(paste0("a", 1:4), names(df))

#Get row-wise minimum
min_val <- do.call(pmin, df[inds])

#Check if there are more than one occurrence of minimum value 
# and if minimum value is less than 200.
df$flag <- rowSums(df[inds] == min_val) > 1 & min_val < 200

df
#  t1  a1  a2  a3  a4  flag
#1 34  NA  NA  NA  NA    NA
#2 26  10  15 250 150 FALSE
#3 34  20  20 100  30  TRUE
#4 35   5   5  10   5  TRUE
#5 25  45 100   3  45 FALSE
#6 31 400 310 500 310 FALSE

Upvotes: 3

How to find if we have two or more equal minimum values in specific columns

Answers (5)

Related Questions