Data Science Beginner
Data Science Beginner

Reputation: 39

Identify repeats in r

Hi I have a dataframe and it looks like this:

Df

test = data.frame("Year" = c("2015","2015","2016","2017","2018"), 
                       "UserID" = c(1,2,1,1,3), "PurchaseValue" = c(1,5,3,3,5))

where "Year" is the time of purchase and "UserID" is the buyer.

I want to create a variable "RepeatedPurchase" that gives "1" if it is a repeated purchase and else 0 (if it is the only purchase/ if it is the first time purchase).

Thus, the desired output would look like this:

Df2

I tried to achieve this by first creating a variable "Se" that tells if that purchase is the 1st/ 2nd/ 3rd... purchase of that buyer but my code didn't work. Wondering what's wrong with my code or is there a better way I can identify repeated purchase? Thanks!

library(dplyr)
df %>% arrange(UserID, Year) %>% group_by(UserID) %>% mutate(Se = seq(n())) %>% ungroup() 

Upvotes: 2

Views: 101

Answers (3)

www
www

Reputation: 39154

Here is another dplyr solution. We can group_by the UserID and PurchaseValue, and then use as.integer(n() > 1) to evaluate if the count is larger than 1.

library(dplyr)

test2 <- test %>%
  group_by(UserID, PurchaseValue) %>%
  mutate(RepeatedPurchase = as.integer(n() > 1)) %>%
  ungroup()

test2
# # A tibble: 5 x 4
#   Year  UserID PurchaseValue RepeatedPurchase
#   <fct>  <dbl>         <dbl>            <int>
# 1 2015       1             1                0
# 2 2015       2             5                0
# 3 2016       1             3                1
# 4 2017       1             3                1
# 5 2018       3             5                0

Upvotes: 2

A. Suliman
A. Suliman

Reputation: 13135

We can start by counting the number of purchases for each UserID and assign 1 when it exceeds 1

test %>% group_by(UserID) %>% mutate(RepeatedPurchase = ifelse(1:n()>1, 1, 0))

   # A tibble: 5 x 4
   # Groups:   UserID [3]
  Year  UserID PurchaseValue Repeatedpurchase
 <fct>  <dbl>         <dbl>            <dbl>
1 2015    1.00          1.00             0
2 2015    2.00          5.00             0   
3 2016    1.00          3.00             1.00
4 2017    1.00          3.00             1.00
5 2018    3.00          5.00             0  

Upvotes: 2

Carles
Carles

Reputation: 2829

You do not need dplyr. You can use duplicated() as following:

test=data.frame("Year" = c("2015","2015","2016","2017","2018"), "UserID" = c(1,2,1,1,3), "PurchaseValue" = c(1,5,3,3,5))

repeated<-duplicated(test$UserID)
# [1] FALSE FALSE  TRUE  TRUE FALSE
test$RepeatedPurchase<-ifelse(repeated==T,1,0)
test
# Year UserID PurchaseValue RepeatedPurchase
# 1 2015      1             1                0
# 2 2015      2             5                0
# 3 2016      1             3                1
# 4 2017      1             3                1
# 5 2018      3             5                0

Cheers!,

Upvotes: 2

Related Questions