nik
nik

Reputation: 2584

similar string from two data frame with a count number

I have two data that I am trying to find similar strings between them with their position.

df1 <- structure(list(split = structure(c(7L, 6L, 8L, 3L, 2L, 4L, 9L, 
4L, 9L, 5L, 10L, 1L), .Label = c("America1", "corea", "coreanorth1", 
"gdyijq", "gqdtr", "india-2", "india1", "india3", "udyhfs", "USA"
), class = "factor"), count = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 
4L, 4L, 5L, 5L)), .Names = c("split", "count"), row.names = c(NA, 
-12L), class = "data.frame")

it looks like this

split       count
india1        1
india-2       1
india3        1
coreanorth1   2
corea         2
gdyijq        3
udyhfs        3
gdyijq        4
udyhfs        4
gqdtr         4
USA           5
America1      5

I have another data with the same structure,

df2<- structure(list(split = structure(c(3L, 2L, 1L), .Label = c("America1", 
"gdyijq", "india1"), class = "factor"), count = 1:3), .Names = c("split", 
"count"), class = "data.frame", row.names = c(NA, -3L))



    split    count
   india1     1
   gdyijq     2
 America1     3

I want to check whether from df2 any string exist in df1 and put the count with a comma seperated for example

india1 is in the df2 and is similar to india1 in df1, so the output is

india1  1,1

if it appears more than once, each time with a semicolon seperated like gdyijq

The output looks like below

india1     1,1
gdyijq     2,3;2,4
America1   3,5

Upvotes: 0

Views: 92

Answers (3)

din
din

Reputation: 692

Here's a possible data.table version:

library(data.table)

# convert to data.table
df1 <- as.data.table(df1)
df2 <- as.data.table(df2)

# set keys for use in matching
setkey(df1, split)
setkey(df2, split)

# chain operations
# match values in df1 using df2; 
# then paste the counts (i.count from df1)
# merge row using split as group (i.count: count from df2)
df1[df2][ , .(split, count = paste(i.count, count, sep =",",  collapse=";")), by = split]

Output is something like this:

      split  counts
1: America1     3,5
2:   gdyijq 2,3;2,4
3:   india1     1,1

Upvotes: 2

user2296153
user2296153

Reputation: 682

you want something like merge or join from dplyr:

library(dplyr)
(DF <- inner_join(df1, df2, by = "split")

Now we have to combine all entries for one split:

DF %>%
  group_by(split) %>%
  summarize(counts = paste0(count.x, ",", count.y, collapse = ";"))

Results in

# A tibble: 3 × 2
     split  counts
     <chr>   <chr>
1 America1     5,3
2   gdyijq 3,2;4,2
3   india1     1,1

Upvotes: 3

R. Schifini
R. Schifini

Reputation: 9313

This will not give you the exact result, but it will list in a data frame all matches and the count values for each:

z = merge(df1,df2,by = "split")

Result:

> z

     split count.x count.y
1 America1       5       3
4   gdyijq       4       2
5   gdyijq       3       2
8   india1       1       1

Upvotes: 2

Related Questions