Bayram Sarilmaz
Bayram Sarilmaz

Reputation: 113

Return an entire row if the value in any specific set of columns meets a certain criteria

I have a dataframe, and I want to retain the rows (fruits) for which any of the price columns is greater than a certain value.

Here is a reproducible example that you can copy&paste directly into R:

fruit = c("apple","orange","banana","berry") #1st col
ID = c(123,3453,4563,3235) #2nd col
price1 = c(3,5,10,20) #3rd col
price2 = c(5,7,9,2) #4th col
price3 = c(4,1,11,8) #5th col

df = as.data.frame(cbind(fruit,ID,price1,price2,price3)) #combine into a dataframe

price_threshold = 10 #define a price

I want to get only the fruits for which any of the prices is greater than 10, which are Banana and Berry in this case

The output I'm expecting is the following two rows:

banana 4563 10  9  11
berry  3235 20  2   8

I tried something like this:

output = df[which(df[,3:5] > price_threshold),]

but it didn't work.

this is close to this post, but here I want to look at any of the values in the last three columns, not just one column.

Any suggestion?

Upvotes: 2

Views: 342

Answers (4)

s_baldur
s_baldur

Reputation: 33488

Programmatic solution inspired by Sam:

price_cols <- grep("^price", names(df), value = TRUE)
price_cols
[1] "price1" "price2" "price3"

df[do.call(pmax, df[price_cols]) > price_threshold, ]
   fruit   ID price1 price2 price3
3 banana 4563     10      9     11
4  berry 3235     20      2      8

Upvotes: 0

Sowmya S. Manian
Sowmya S. Manian

Reputation: 3833

All your columns in data frame df are factors, thats why it doesnt work. No need to use cbind()

fruit = c("apple","orange","banana","berry") #1st col
ID = c(123,3453,4563,3235) #2nd col
price1 = c(3,5,10,20) #3rd col
price2 = c(5,7,9,2) #4th col
price3 = c(4,1,11,8) #5th col

df <- data.frame(fruit, ID, price1, price2, price3)
df
#    fruit   ID price1 price2 price3
# 1  apple  123      3      5      4
# 2 orange 3453      5      7      1
# 3 banana 4563     10      9     11
# 4  berry 3235     20      2      8

df[which(df[,3] > price_thres | df[,4] > price_thres | df[,5] > price_thres),]
#    fruit   ID price1 price2 price3
# 3 banana 4563     10      9     11
# 4  berry 3235     20      2      8

Upvotes: 0

Sam
Sam

Reputation: 654

One line and readable solution.

df[pmax(df$price1, df$price2, df$price3) > 10, ]

Upvotes: 2

Florian
Florian

Reputation: 25385

First, it is better to initialize your data.frame as

df = data.frame(fruit,ID,price1,price2,price3)

So the variables are not parsed to factors. You can then get your expected outcome with:

df[rowSums(df[,3:5] > price_threshold)>0,]

Result:

   fruit   ID price1 price2 price3
3 banana 4563     10      9     11
4  berry 3235     20      2      8

Hope this helps!

Upvotes: 0

Related Questions