John legend2
John legend2

Reputation: 920

Indicate whether each value exists before

While doing my data work I have this problem.
I have customer id, receipt_id and product_id. The product_id indicates the products that the given customer purchased at the given receipt.
The data is sorted by customer id and receipt_id. The lower value of receipt_id means the earlier shopping trip.

For each product, I want to create dummy variable that indicate whether each product is purchased in past shopping trip (in previous receipt id).
I have first three columns and want to create 4th column, "purchased_before".

I can do it by using for loop but is there any efficient way?

Data is as below,

customer id      receipt_id   product_id     purchased_before
    1             1               113                 0
    1             1               114                 0
    1             2               113                 1
    1             2               116                 0
    1             2               346                 0
    1             3               421                 0
    1             3               114                 1
    1             3               421                 0
    ....
    2             1               213                 0
    2             1               114                 0
    2             2               113                 0
    2             2               116                 0
    2             2               346                 0
    2             3               113                 1
    2             3               114                 1
    2             3               421                 0
    ....

Upvotes: 0

Views: 53

Answers (1)

Pierre Lapointe
Pierre Lapointe

Reputation: 16277

Here's how to do that with dplyr. Note that this solution picks up that product_id 421 has been purchased before by customer id 1. Your question does not show that.

df <-read.table(text="customer_id  receipt_id  product_id  purchased_before
1             1               113                 0
1             1               114                 0
1             2               113                 1
1             2               116                 0
1             2               346                 0
1             3               421                 0
1             3               114                 1
1             3               421                 0
2             1               213                 0
2             1               114                 0
2             2               113                 0
2             2               116                 0
2             2               346                 0
2             3               113                 1
2             3               114                 1
2             3               421                 0",header=TRUE, stringsAsFactors=FALSE)

library(dplyr)
df %>%
group_by(customer_id,product_id) %>%
mutate(purchased_before2=duplicated(product_id)*1)

   customer_id receipt_id product_id purchased_before purchased_before2
         <int>      <int>      <int>            <int>             <dbl>
1            1          1        113                0                 0
2            1          1        114                0                 0
3            1          2        113                1                 1
4            1          2        116                0                 0
5            1          2        346                0                 0
6            1          3        421                0                 0
7            1          3        114                1                 1
8            1          3        421                0                 1
9            2          1        213                0                 0
10           2          1        114                0                 0
11           2          2        113                0                 0
12           2          2        116                0                 0
13           2          2        346                0                 0
14           2          3        113                1                 1
15           2          3        114                1                 1
16           2          3        421                0                 0

Upvotes: 1

Related Questions