Christian
Christian

Reputation: 359

Data Manipulation with R

i have a problem where i found myself no good solution in time. Really appreciate any assistance as i think its for some professionals in here just a few lines of code.

my data contains over 2 mio. rows of transactions. I want to do some sort of association rules on the data.

I´m just interested in transactions (t_ID) which have the Product(P_ID) "PANDORA" involved and where i know the customer (c_ID). I Made an example:

> T_ID <- c(10,10,10,11,12,13,13)
> P_ID <- c("PANDORA", "Others", "Pan","PANDORA","Ham", "PANDORA","Ham")
> c_ID <- c(1,1,1,2,-1,4,4)
> basket <- data.frame(T_ID,P_ID,c_ID)
> basket
T_ID    P_ID c_ID
1   10 PANDORA    1
2   10  Others    1
3   10     Pan    1
4   11 PANDORA    2
5   12     Ham   -1
6   13 PANDORA    4
7   13     Ham    4

Transaction 10 contains the product "Pandora", therefore all 3 Rows should remain in the dataset. While Transaction 12 has no Customer attached it needs to be removed.

Im struggling the most on the part how to keep the transaction rows which are related to the same transaction ID containing "PANDORA" but have another product stored.

Any help greatly appreciated,

Best regards, Christian

Upvotes: 0

Views: 96

Answers (2)

RHertel
RHertel

Reputation: 23818

Maybe this helps:

keep_IDs <- basket$T_ID[with(basket, P_ID=="PANDORA" & c_ID!=-1)]
basket[basket$T_ID %in% keep_IDs,]
#  T_ID    P_ID c_ID
#1   10 PANDORA    1
#2   10  Others    1
#3   10     Pan    1
#4   11 PANDORA    2
#6   13 PANDORA    4
#7   13     Ham    4

data

basket <- structure(list(T_ID = c(10L, 10L, 10L, 11L, 12L, 13L, 13L, 14L, 14L), 
P_ID = structure(c(6L, 4L, 5L, 6L, 1L, 6L, 1L, 3L, 2L), 
.Label = c("Ham","Honey", "Nugget", "Others", "Pan", "PANDORA"), class = "factor"), 
c_ID = c(1L, 1L, 1L, 2L, -1L, 4L, 4L, 5L, 5L)), 
.Names = c("T_ID", "P_ID", "c_ID"), class = "data.frame", 
row.names = c("1", "2", "3", "4", "5", "6", "7", "8", "9"))

Upvotes: 2

CPhil
CPhil

Reputation: 927

Does each transaction have only one customer id? I'm assuming so.

First step is to remove those rows which do not have customer id.

cleanbasket = basket[bucket$c_ID != -1,]

Next, we want to identify which transactions include PANDORA.

transactions = unique(basket$T_ID[basket$P_ID == "PANDORA"])

Then get all the rows for these transactions

cleanbasket = cleanbasket[cleanbasket$T_ID %in% transactions,]

Upvotes: 0

Related Questions