Reputation: 39
Hi I have a dataframe and it looks like this:
test = data.frame("Year" = c("2015","2015","2016","2017","2018"),
"UserID" = c(1,2,1,1,3), "PurchaseValue" = c(1,5,3,3,5))
where "Year" is the time of purchase and "UserID" is the buyer.
I want to create a variable "RepeatedPurchase" that gives "1" if it is a repeated purchase and else 0 (if it is the only purchase/ if it is the first time purchase).
Thus, the desired output would look like this:
I tried to achieve this by first creating a variable "Se" that tells if that purchase is the 1st/ 2nd/ 3rd... purchase of that buyer but my code didn't work. Wondering what's wrong with my code or is there a better way I can identify repeated purchase? Thanks!
library(dplyr)
df %>% arrange(UserID, Year) %>% group_by(UserID) %>% mutate(Se = seq(n())) %>% ungroup()
Upvotes: 2
Views: 101
Reputation: 39154
Here is another dplyr
solution. We can group_by
the UserID
and PurchaseValue
, and then use as.integer(n() > 1)
to evaluate if the count is larger than 1.
library(dplyr)
test2 <- test %>%
group_by(UserID, PurchaseValue) %>%
mutate(RepeatedPurchase = as.integer(n() > 1)) %>%
ungroup()
test2
# # A tibble: 5 x 4
# Year UserID PurchaseValue RepeatedPurchase
# <fct> <dbl> <dbl> <int>
# 1 2015 1 1 0
# 2 2015 2 5 0
# 3 2016 1 3 1
# 4 2017 1 3 1
# 5 2018 3 5 0
Upvotes: 2
Reputation: 13135
We can start by counting the number of purchases for each UserID and assign 1 when it exceeds 1
test %>% group_by(UserID) %>% mutate(RepeatedPurchase = ifelse(1:n()>1, 1, 0))
# A tibble: 5 x 4
# Groups: UserID [3]
Year UserID PurchaseValue Repeatedpurchase
<fct> <dbl> <dbl> <dbl>
1 2015 1.00 1.00 0
2 2015 2.00 5.00 0
3 2016 1.00 3.00 1.00
4 2017 1.00 3.00 1.00
5 2018 3.00 5.00 0
Upvotes: 2
Reputation: 2829
You do not need dplyr. You can use duplicated()
as following:
test=data.frame("Year" = c("2015","2015","2016","2017","2018"), "UserID" = c(1,2,1,1,3), "PurchaseValue" = c(1,5,3,3,5))
repeated<-duplicated(test$UserID)
# [1] FALSE FALSE TRUE TRUE FALSE
test$RepeatedPurchase<-ifelse(repeated==T,1,0)
test
# Year UserID PurchaseValue RepeatedPurchase
# 1 2015 1 1 0
# 2 2015 2 5 0
# 3 2016 1 3 1
# 4 2017 1 3 1
# 5 2018 3 5 0
Cheers!,
Upvotes: 2