Reputation: 379
I am new to R . I have a data frame(usr.query) with structure as shown below
[
Now I want to take text of each id and compare it to text of all the other id and and if there is a match, i want to append it to a new column say count of match.
A0008 with A0043,A0065,A0082,B0018,B0026
A0043 with A0008,A0065,A0082,B0018,B0026
Function to apply
count_match = length(intersect(unlist(strsplit(query1," ")),unlist(strsplit(query2," "))))
The query 1 here is text of A0008 and query 2 is text of A0043,A0065,A0082,B0018,B0026
I tried the suggested solution and here is the result.
Upvotes: 0
Views: 55
Reputation: 43334
No loops are necessary; you'll usually find that's the case in R, because it's really good at utilizing vectorized operations. In this case, you can get the necessary combinations with combn
, and then make the match_count
column by subsetting the original data.frame with the combinations of the new one, and testing for equality. Adding zero changes the values from Boolean to numeric (use as.integer
, if you prefer).
# assemble sample data
df <- data.frame(id = 1:5, text = c('apple', 'mango', 'apple', 'apple', 'mango'))
# make combinations
df2 <- as.data.frame(t(combn(df$id, 2)))
# add names
names(df2) <- c('main_id', 'compared_to_id')
# test for match
df2$match_count <- (df[df2$main_id, 'text'] == df[df2$compared_to_id, 'text']) + 0
The result:
> df2
main_id compared_to_id match_count
1 1 2 0
2 1 3 1
3 1 4 1
4 1 5 0
5 2 3 0
6 2 4 0
7 2 5 1
8 3 4 1
9 3 5 0
10 4 5 0
Upvotes: 2