Reputation: 271
I am using R for my project.I am totally new to R. I have following data
place<-c("S1","S1","S1","S1","S2","S2","S2","S2")
product<-c("P1","P2","P3","P1","P2","P3","P1","P2")
location<-c("loc1","loc1","loc2","loc2","loc1","loc1","loc2","loc2")
profit<-c(55,80,70,90,30,40,15,20)
data<-data.frame(place,product,location,profit)
I want for each place, which product is giving maximum profit at each location and in output it will add one more column with a binary entries, where 1 corresponds to position in the vector where profit is max, like in following way:
solution<-c(0,1,1,0,0,1,0,0)
Hope my question is clear. Thanks in advance.
Upvotes: 0
Views: 144
Reputation: 81683
You can use ave
:
transform(data, solution = ave(profit, place, location,
FUN = function(x) as.integer(x == max(x))))
place product location profit solution
1 S1 P1 loc1 55 0
2 S1 P2 loc1 80 1
3 S1 P3 loc2 70 0
4 S1 P1 loc2 90 1
5 S2 P2 loc1 30 0
6 S2 P3 loc1 40 1
7 S2 P1 loc2 15 0
8 S2 P2 loc2 20 1
Upvotes: 2
Reputation: 7830
Is it this vector you expect for this example ? How can "solution" contains only 3 "1" if you have 2 differents location for 2 differents places ?
Here is my solution :
place<-c("S1","S1","S1","S1","S2","S2","S2","S2")
product<-c("P1","P2","P3","P1","P2","P3","P1","P2")
location<-c("loc1","loc1","loc2","loc2","loc1","loc1","loc2","loc2")
profit<-c(55,80,70,90,30,40,15,20)
data<-data.frame(place,product,location,profit)
# Returns a data frame with the profit max for each place at each location
df <- aggregate(data$profit, by = list(place = data$place, location = data$location), max)
# Formating names
names(df)[3] <- c("profit")
# All the lines returned are thoses you want to index with "1" in "solution
df$solution <- rep(1, nrow(df))
# Right outter join, we keep all lines of data which don't respect the join criteria (we dont precise by.x and by.y, it's a natural join on the names, it will produce NA in "solution" for missing correspondances)
data <- merge(df, data, all.y = T)
# The join produced NA solutions for lines which didn't exist in "data", we replace them by 0
data$solution[which(is.na(data$solution))] <- 0
> data
place location profit solution product
1 S1 loc1 55 0 P1
2 S1 loc1 80 1 P2
3 S1 loc2 70 0 P3
4 S1 loc2 90 1 P1
5 S2 loc1 30 0 P2
6 S2 loc1 40 1 P3
7 S2 loc2 15 0 P1
8 S2 loc2 20 1 P2
> data$solution
[1] 0 1 0 1 0 1 0 1
Hope this help.
Upvotes: 0