Reputation: 505
this is my dataframe a:
ui 194635691 194153563 177382028 177382031 195129144 196972549 196258704 194907960 196950156 194139014 153444738
1 56320e0e55e89c3e14e26d3d 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.000 0 0
2 563734c3b65dd40e340eaa56 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0 0
3 563e12657d4c410c5832579c 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.000 0 0
4 565181854c24b410e4891e11 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.000 0 0
5 5651b53fec231f1df8482d23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.027 0 0
6 56548df4b84c321fe4cdfb8f 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0 0
7 56549946735e782a885957e6 0.00 0.00 0.00 0.00 0.08 0.00 0.00 0.00 0.000 0 0
8 56549f9bb84c321fe4ce7a37 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0 0
9 5654a35a735e782a8859a053 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.000 0 0
What I want to do here is calculate the cosine similarity between useridvector and each row of data frame a but without first column(ui) I have tried following code so far:
user_id=actions_slippers$ui[i]#user_id is coming from another dataframe called action_slippers
useridvector=a[a$ui %in% user_id, ]
p=as.vector(cosine(t(a[,2:ncol(a)]))[,1])# this measures cosine similarity between first row of dataframe a and each other of rows from dataframe a
but I want to calculate cosine similarity between useridvector and each row of dataframe a without first column. useridvector looks like this:
ui 194635691 194153563 177382028 177382031 195129144 196972549 196258704 194907960 196950156 194139014 153444738
5651b53fec231f1df8482d23 0 0 0 0 0 0 0 0 0.027 0 0
Can anyone tell me how to do this?
Upvotes: 1
Views: 3632
Reputation: 1365
cosine{lsa}
works. I'd like to share my try.
suppose you save the data in a dataframe
like:
> data
ui X194635691 X194153563 X177382028 X177382031 X195129144 X196972549 X196258704 X194907960 X196950156 X194139014 X153444738
1 56320e0e55e89c3e14e26d3d 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.000 0 0
2 563734c3b65dd40e340eaa56 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0 0
3 563e12657d4c410c5832579c 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.000 0 0
4 565181854c24b410e4891e11 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.000 0 0
5 5651b53fec231f1df8482d23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.027 0 0
6 56548df4b84c321fe4cdfb8f 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0 0
7 56549946735e782a885957e6 0.00 0.00 0.00 0.00 0.08 0.00 0.00 0.00 0.000 0 0
8 56549f9bb84c321fe4ce7a37 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0 0
9 5654a35a735e782a8859a053 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.000 0 0
Using data[,-1]
or subset.data.frame(data, select = names(data)[-1]
to eliminate the first column,then convert to matrix and use the cosine{lsa}
> res <- lsa::cosine(t(as.matrix(data[, -1])))
> res
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 0 0 0 0 0 0 0 0
[2,] 0 1 0 0 0 0 0 0 0
[3,] 0 0 1 0 0 0 0 0 0
[4,] 0 0 0 1 0 0 0 0 0
[5,] 0 0 0 0 1 0 0 0 0
[6,] 0 0 0 0 0 1 0 1 0
[7,] 0 0 0 0 0 0 1 0 0
[8,] 0 0 0 0 0 1 0 1 0
[9,] 0 0 0 0 0 0 0 0 1
PS: install the lsa
package and see ?cosine
for detail info
============================ update =====
The resulting matrix is like,
user1 **user2** user3 **user4**
user1 1 0
user2 1
user3 ... 1
user4
where element(i,j) means the similarity between user i and user j.
and if your userid
has 2 users say user 2 and user 4.
Then you want to find the similarity between these 2 users to other users.
which is a sub matrix of the entire similarity matrix.
Then use res[, c(2,4)] to obtain the desired matrix.
Upvotes: 5