Reputation: 2584
I have two data that I am trying to find similar strings between them with their position.
df1 <- structure(list(split = structure(c(7L, 6L, 8L, 3L, 2L, 4L, 9L,
4L, 9L, 5L, 10L, 1L), .Label = c("America1", "corea", "coreanorth1",
"gdyijq", "gqdtr", "india-2", "india1", "india3", "udyhfs", "USA"
), class = "factor"), count = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L,
4L, 4L, 5L, 5L)), .Names = c("split", "count"), row.names = c(NA,
-12L), class = "data.frame")
it looks like this
split count
india1 1
india-2 1
india3 1
coreanorth1 2
corea 2
gdyijq 3
udyhfs 3
gdyijq 4
udyhfs 4
gqdtr 4
USA 5
America1 5
I have another data with the same structure,
df2<- structure(list(split = structure(c(3L, 2L, 1L), .Label = c("America1",
"gdyijq", "india1"), class = "factor"), count = 1:3), .Names = c("split",
"count"), class = "data.frame", row.names = c(NA, -3L))
split count
india1 1
gdyijq 2
America1 3
I want to check whether from df2 any string exist in df1 and put the count with a comma seperated for example
india1 is in the df2 and is similar to india1 in df1, so the output is
india1 1,1
if it appears more than once, each time with a semicolon seperated like gdyijq
The output looks like below
india1 1,1
gdyijq 2,3;2,4
America1 3,5
Upvotes: 0
Views: 92
Reputation: 692
Here's a possible data.table version:
library(data.table)
# convert to data.table
df1 <- as.data.table(df1)
df2 <- as.data.table(df2)
# set keys for use in matching
setkey(df1, split)
setkey(df2, split)
# chain operations
# match values in df1 using df2;
# then paste the counts (i.count from df1)
# merge row using split as group (i.count: count from df2)
df1[df2][ , .(split, count = paste(i.count, count, sep =",", collapse=";")), by = split]
Output is something like this:
split counts
1: America1 3,5
2: gdyijq 2,3;2,4
3: india1 1,1
Upvotes: 2
Reputation: 682
you want something like merge or join from dplyr:
library(dplyr)
(DF <- inner_join(df1, df2, by = "split")
Now we have to combine all entries for one split:
DF %>%
group_by(split) %>%
summarize(counts = paste0(count.x, ",", count.y, collapse = ";"))
Results in
# A tibble: 3 × 2
split counts
<chr> <chr>
1 America1 5,3
2 gdyijq 3,2;4,2
3 india1 1,1
Upvotes: 3
Reputation: 9313
This will not give you the exact result, but it will list in a data frame all matches and the count values for each:
z = merge(df1,df2,by = "split")
Result:
> z
split count.x count.y
1 America1 5 3
4 gdyijq 4 2
5 gdyijq 3 2
8 india1 1 1
Upvotes: 2