Reputation: 15
As I study R, I had the new question. Can you tell me, i have the goods categories. The goods of each categories have the price.Is it possible to write string in R so that, if value of observation exceeds the average of group by more than 500 000 in this commodity category, then this obs. would removed from the analysis. I.E, I need from all commodity categories (grouping variable) to remove the observations, values of which more the 500 000 of the average for the group.
data = read.table(textConnection("
cat price
1 100000
1 200000
1 300000
1 400000
1 1000000
2 100000
2 200000
2 50000
2 100000
2 1000000
2 2000000
"),head=TRUE)
Upvotes: 0
Views: 86
Reputation: 5335
Using dplyr
:
library(dplyr)
data %>%
group_by(cat) %>%
filter(price - mean(price) <= 500000)
Result:
Source: local data frame [9 x 2]
Groups: cat [2]
cat price
<int> <int>
1 1 100000
2 1 200000
3 1 300000
4 1 400000
5 2 100000
6 2 200000
7 2 50000
8 2 100000
9 2 1000000
Upvotes: 1
Reputation: 887048
With data.table
, we convert the 'data.frame' to 'data.table' (setDT(data)
), grouped by 'cat', we subset the rows of Subset of Data.table (.SD
) using the logical condition
library(data.table)
setDT(data)[, .SD[(price - mean(price)) <= 500000], cat]
Or we can use the row index (.I
)
setDT(data)[data[, .I[(price - mean(price)) <= 500000], cat]$V1]
# cat price
#1: 1 100000
#2: 1 200000
#3: 1 300000
#4: 1 400000
#5: 2 100000
#6: 2 200000
#7: 2 50000
#8: 2 100000
#9: 2 1000000
Upvotes: 1
Reputation: 2070
With base
:
subset(data,!data$price>(ave(mydata$price,mydata$cat)+500000))
Result:
cat price
1 1 100000
2 1 200000
3 1 300000
4 1 400000
6 2 100000
7 2 200000
8 2 50000
9 2 100000
10 2 1000000
Upvotes: 4