ygtmnv
ygtmnv

Reputation: 37

Keep the row if the specific column is the minimum value of that row

I cannot share the dataset but I will explain it as best as I can. The dataset has 50 columns 48 of them are in Y/m/d h:m:s format. also the data has many NA, but it must not be removed.

Let's say there is a column B. I want to remove the rows if the value of B is not the earliest in that row.

How can I do this in R? For example, the original would be like this:

df <- data.frame(
  A = c(11,19,17,6,13),
  B = c(18,9,5,16,12),
  C = c(14,15,8,87,16))

   A  B  C
1 11 18 14
2 19  9 15
3 17  5  8
4  6 16 87
5 13 12 16

but I want this:

   A  B  C
1 19  9 15
2 17  5  8
3 13 12 16

Upvotes: 1

Views: 99

Answers (2)

diomedesdata
diomedesdata

Reputation: 1075

If you are willing to use data.table, you could do the following for the example.

library(data.table)
setDT(df)

df[(B < A & B < C)]
    A  B  C
1: 19  9 15
2: 17  5  8
3: 13 12 16

More generally, you could do

df <- as.data.table(df)

df[, min := do.call(pmin, .SD)][B == min, !"min"]

.SDcols in the first [ would let you control which columns you want to take the min over, if you wanted to eg. exclude some. I am not super knowledgeable about the inner workings of data.table, but I believe that creating this new column is probably efficient RAM-wise.

Upvotes: 0

Darren Tsai
Darren Tsai

Reputation: 35584

You could use apply() to find the minimum for each row.

df |> subset(B == apply(df, 1, min, na.rm = TRUE))

#    A  B  C
# 2 19  9 15
# 3 17  5  8
# 5 13 12 16

The tidyverse equivalent is

library(tidyverse)

df %>% filter(B == pmap(across(A:C), min, na.rm = TRUE))

Upvotes: 1

Related Questions