Reputation: 63
Firstly, I'm pretty new to R and I'm also new with asking questions on SO, so bear with me if I'm asking stupid questions or am not following SO conventions.
I'm trying to find the best subscription type for multiple users based on their former calling behaviour. Until now I've managed to approx. match 98'000 rows to calculate the effective costs with a (variable) number of subscription types.
There's also a data frame with the predicted costs per subscription type for every month:
Now I'm trying to find the best subscription type where the costs are lower than the current one for every user and month. I'll do a merge to show which results I'd expect:
So, in month 2019-01
the costs of subscription_2
are lower than the current subscription for User1
, therefore subscription2
should be the recommendation. For months 2019-02
and 2019-03
there is no recommendation as there are no subscription types with lower costs.
For User2
subscription type subscription_3
should be the recommendation in all months, since these costs are always lower than the current subscription.
I'm currently following the courses on DataCamp.com and I am pretty sure that this is almost certainly a VERY basic action in r
, but I need someone to guide me into the right direction.
This is what I have so far:
library(dplyr)
effective.costs <- data.frame(
user = c(rep("User1", 3), rep("User2", 3)),
month = c(rep(c("2019-01", "2019-02", "2019-03"), 2)),
current_subscription = c(rep("subscription_1", 3), rep("subscription_2", 3)),
costs = c(70, 20, 50, 150, 130, 170)
)
predicted.costs <- data.frame(
user = c(rep("User1", 9), rep("User2", 9)),
month = c(rep("2019-01",3), rep("2019-02", 3), rep("2019-03", 3)),
subscription = c(rep(c("subscription_1", "subscription_2", "subscription_3"), 6)),
calculated_costs = c(
c(70, 50, 110, 20, 50, 70, 50, 80, 120),
c(190, 150, 110, 210, 130, 110, 250, 170, 110)
)
)
comparison <- merge(effective.costs, predicted.costs, by = c("user", "month"))
getRecommendation <- function(x) {
subscription <- predicted.costs %>%
filter(
calculated_costs < x['costs'] &
user == x['user'] &
month == x['month']
) %>%
arrange(calculated_costs) %>%
select(subscription)
subscription <- ifelse(
length(subscription) > 0,
as.character(subscription[1, 1]),
NA
)
# I know return is not needed, but I'm used to it... :-)
return(subscription)
}
effective.costs$recommendation <- apply(effective.costs, 1, getRecommendation)
View(effective.costs)
The most important part here is probably the function getRecommendation
:
getRecommendation <- function(x) {
subscription <- predicted.costs %>%
filter(
calculated_costs < x['costs'] &
user == x['user'] &
month == x['month']
) %>%
arrange(calculated_costs) %>%
select(subscription)
subscription <- ifelse(
length(subscription) > 0,
as.character(subscription[1, 1]),
NA
)
# I know return is not needed, but I'm used to it... :-)
return(subscription)
}
which I'm trying to apply
to every row in effective.costs
:
effective.costs$recommendation <- apply(effective.costs, 1, getRecommendation)
While this gives me the correct output for User2
, I currently believe this to be coincident since there's no recommendation for User1
, even if there should be one for month 2019-01
:
Could someone please push me into the right direction?
Thank you!
Upvotes: 2
Views: 1054
Reputation: 11255
This does away with the apply
and the getRecommendation
function. R
is vectorized so we should try to think column-wise as much as we can.
comparison <- merge(effective.costs, predicted.costs, by = c("user", "month"))
comparison%>%
mutate(net_savings = calculated_costs-costs)%>%
group_by(user, month)%>%
filter(net_savings == min(net_savings))%>%
slice(1) #for ties
The issue with your apply()
function is that apply()
coerces the data.frame to a matrix. A matrix can only have one class type - in this case, you're comparing a number to a string in calculated_costs < x['costs']
.
Specifically, the evaluation is calculated_costs < ' 50'
where there is an extra space for 2-digit numbers. For whatever reason, 50 < ' 70'
evaluates FALSE
whereas 110 < '190'
evaluates TRUE
.
The solution is to approach the problem differently in this case. There's no need to do rowwise operations via apply
.
Upvotes: 1