Reputation: 131
I have a data frame A which contains user_id
and product_id
. I have another data frame B
which contains product_id
and tags_id
. Each product is associated with some tags. I have made a matrix C
containing user ids of all users as row name and tag ids as column name with every cell filled with zeroes initially. What I want to do is traverse through each row of A
and fill up C
accordingly.
For example - Suppose I encounter user_id = 001
and product_id = 101
in the first row of A
. Now I search all the tags associated with product 101 in B
. Let them be 201 and 202. As I encounter the tags in B
, I want to increment 1
in columns tag_id_201
and tag_id_202
of row user_id_001
of C
. I want to do this for every row that I encounter in A
. One should also note that in A
, a user_id
can be repeated multiple times associated with different product_id
.
I could apply nested for loops to achieve this.
for(i in 1:nrow(A)) #Looping through every entry in A
{for(j in 1:nrow(C))#Looping through every row in C to find the correct user_id
{for(k in 1:nrow(B))#Looping through every row in B to find the appropriate tags for for a particular product
{for(l in 1:ncol(C))#Looping through every column in C to find the relevant tag_id according to the tag_id in the upper loop
{if((A$user_id[i] == rownames(C)[j])&(A$product_id[i] == B$product_id[k]) & (B$tag_id[k] == colnames(C)[l]))
C[j, l] <- C[j, l] + 1
}}}}
But the thing with nested loops is that they take a lot of time in R. Is there a time efficient way of doing this? Also, I am getting an error in the if statement ("argument is of length zero
").
Edit : I have uploaded an example.
Upvotes: 0
Views: 51
Reputation: 3879
Like @JMenezes, I suggest using merge
and table
. I wrote a small example:
A <- data.frame(user_id = c(1, 1, 1, 2, 2, 2),
product_id = c(201, 202, 203, 201, 202, 210))
B <- data.frame(product_id = c(201, 201, 201, 202, 202, 202, 203, 203, 203, 210),
tags_id = c(301, 302, 303, 301, 302, 309, 301, 309, 303, 303))
C <- matrix(NA, nrow = length(unique(A$user_id)), ncol = length(unique(B$tags_id)))
rownames(C) <- unique(A$user_id)
colnames(C) <- unique(B$tags_id)
C <- as.data.frame(C)
D <- merge(A, B)
for(i in unique(A$user_id)){
tab <- table(subset(D, user_id == i)$tags_id)
C[as.character(i), names(tab)] <- tab
}
Edit: I changed the answer to fit Ronnie Days example (https://i.sstatic.net/mevBC.jpg). The code to set up A
and B
before was:
set.seed(1)
A <- data.frame(user_id = sample(1:5, replace = TRUE, size = 10),
product_id = sample(101:105, replace = TRUE, size = 10))
B <- data.frame(product_id = sample(101:105, replace = TRUE, size = 10),
tags_id = sample(101:105, replace = TRUE, size = 10))
#take values from question
A[1, "user_id"] <- 1
A[1, "product_id"] <- 101
#generate a product with product_id = 101
B[1, "product_id"] <- 101
B[2, "product_id"] <- 101
#give them the tags_id 101 and 102
B[1, "tags_id"] <- 101
B[2, "tags_id"] <- 102
Upvotes: 0
Reputation: 1059
I think you are trying to do what is called a join in database language. In R, you can achieve that by using merge()
join<-merge(A,B,by="product_id")
This will return to you a data.frame with two columns. The first column in the user_id, and the second is tag_id. (I think product_id is also present as a third column, but it does not matter to the problem at hand).
Now, with this data.frame you can use table() to count the number times each pair (user_id,tag_id) appears this new data.frame.
output<-table(join$user_id,join$tag_id)
That should give you a matrix that is very like the matrix C you described.
Upvotes: 1