Ronnie Day
Ronnie Day

Reputation: 131

How to apply a function on a selected elements in R according to some conditions?

I have a data frame A which contains user_id and product_id. I have another data frame B which contains product_id and tags_id. Each product is associated with some tags. I have made a matrix C containing user ids of all users as row name and tag ids as column name with every cell filled with zeroes initially. What I want to do is traverse through each row of A and fill up C accordingly.

For example - Suppose I encounter user_id = 001 and product_id = 101 in the first row of A. Now I search all the tags associated with product 101 in B. Let them be 201 and 202. As I encounter the tags in B, I want to increment 1 in columns tag_id_201 and tag_id_202 of row user_id_001 of C. I want to do this for every row that I encounter in A. One should also note that in A, a user_id can be repeated multiple times associated with different product_id.

I could apply nested for loops to achieve this.

for(i in 1:nrow(A)) #Looping through every entry in A
{for(j in 1:nrow(C))#Looping through every row in C to find the correct user_id
{for(k in 1:nrow(B))#Looping through every row in B to find the appropriate tags for for a particular product
{for(l in 1:ncol(C))#Looping through every column in C to find the relevant tag_id according to the tag_id in the upper loop
{if((A$user_id[i] == rownames(C)[j])&(A$product_id[i] == B$product_id[k]) & (B$tag_id[k] == colnames(C)[l]))
C[j, l] <- C[j, l] + 1
}}}}

But the thing with nested loops is that they take a lot of time in R. Is there a time efficient way of doing this? Also, I am getting an error in the if statement ("argument is of length zero").

Edit : I have uploaded an example.

Upvotes: 0

Views: 51

Answers (2)

Qaswed
Qaswed

Reputation: 3879

Like @JMenezes, I suggest using merge and table. I wrote a small example:

A <- data.frame(user_id = c(1, 1, 1, 2, 2, 2), 
            product_id = c(201, 202, 203, 201, 202, 210))
B <- data.frame(product_id = c(201, 201, 201, 202, 202, 202, 203, 203, 203, 210),  
            tags_id = c(301, 302, 303, 301, 302, 309, 301, 309, 303, 303))

C <- matrix(NA, nrow = length(unique(A$user_id)), ncol = length(unique(B$tags_id)))

rownames(C) <- unique(A$user_id)
colnames(C) <- unique(B$tags_id)
C <- as.data.frame(C)

D <- merge(A, B)
for(i in unique(A$user_id)){

  tab <- table(subset(D, user_id == i)$tags_id)  
  C[as.character(i), names(tab)] <- tab

}

Edit: I changed the answer to fit Ronnie Days example (https://i.sstatic.net/mevBC.jpg). The code to set up A and B before was:

set.seed(1)

A <- data.frame(user_id = sample(1:5, replace = TRUE, size = 10), 
              product_id = sample(101:105, replace = TRUE, size = 10))
B <- data.frame(product_id = sample(101:105, replace = TRUE, size = 10), 
              tags_id = sample(101:105, replace = TRUE, size = 10))


#take values from question
A[1, "user_id"] <- 1 
A[1, "product_id"] <- 101 

#generate a product with product_id = 101
B[1, "product_id"] <- 101 
B[2, "product_id"] <- 101 
#give them the tags_id 101 and 102
B[1, "tags_id"] <- 101
B[2, "tags_id"] <- 102

Upvotes: 0

JMenezes
JMenezes

Reputation: 1059

I think you are trying to do what is called a join in database language. In R, you can achieve that by using merge()

join<-merge(A,B,by="product_id")

This will return to you a data.frame with two columns. The first column in the user_id, and the second is tag_id. (I think product_id is also present as a third column, but it does not matter to the problem at hand).

Now, with this data.frame you can use table() to count the number times each pair (user_id,tag_id) appears this new data.frame.

output<-table(join$user_id,join$tag_id)

That should give you a matrix that is very like the matrix C you described.

Upvotes: 1

Related Questions