Reputation: 113
I have a dataframe, and I want to retain the rows (fruits) for which any of the price columns is greater than a certain value.
Here is a reproducible example that you can copy&paste directly into R:
fruit = c("apple","orange","banana","berry") #1st col
ID = c(123,3453,4563,3235) #2nd col
price1 = c(3,5,10,20) #3rd col
price2 = c(5,7,9,2) #4th col
price3 = c(4,1,11,8) #5th col
df = as.data.frame(cbind(fruit,ID,price1,price2,price3)) #combine into a dataframe
price_threshold = 10 #define a price
I want to get only the fruits for which any of the prices is greater than 10, which are Banana and Berry in this case
The output I'm expecting is the following two rows:
banana 4563 10 9 11
berry 3235 20 2 8
I tried something like this:
output = df[which(df[,3:5] > price_threshold),]
but it didn't work.
this is close to this post, but here I want to look at any of the values in the last three columns, not just one column.
Any suggestion?
Upvotes: 2
Views: 342
Reputation: 33488
Programmatic solution inspired by Sam:
price_cols <- grep("^price", names(df), value = TRUE)
price_cols
[1] "price1" "price2" "price3"
df[do.call(pmax, df[price_cols]) > price_threshold, ]
fruit ID price1 price2 price3
3 banana 4563 10 9 11
4 berry 3235 20 2 8
Upvotes: 0
Reputation: 3833
All your columns in data frame df are factors, thats why it doesnt work. No need to use cbind()
fruit = c("apple","orange","banana","berry") #1st col
ID = c(123,3453,4563,3235) #2nd col
price1 = c(3,5,10,20) #3rd col
price2 = c(5,7,9,2) #4th col
price3 = c(4,1,11,8) #5th col
df <- data.frame(fruit, ID, price1, price2, price3)
df
# fruit ID price1 price2 price3
# 1 apple 123 3 5 4
# 2 orange 3453 5 7 1
# 3 banana 4563 10 9 11
# 4 berry 3235 20 2 8
df[which(df[,3] > price_thres | df[,4] > price_thres | df[,5] > price_thres),]
# fruit ID price1 price2 price3
# 3 banana 4563 10 9 11
# 4 berry 3235 20 2 8
Upvotes: 0
Reputation: 654
One line and readable solution.
df[pmax(df$price1, df$price2, df$price3) > 10, ]
Upvotes: 2
Reputation: 25385
First, it is better to initialize your data.frame as
df = data.frame(fruit,ID,price1,price2,price3)
So the variables are not parsed to factors. You can then get your expected outcome with:
df[rowSums(df[,3:5] > price_threshold)>0,]
Result:
fruit ID price1 price2 price3
3 banana 4563 10 9 11
4 berry 3235 20 2 8
Hope this helps!
Upvotes: 0