sarah
sarah

Reputation: 229

Compare dataframe column to another dataframe column

I have a dataframe column containing page paths (let's call it A):

pagePath
/text/other_text/123-string1-4571/text.html
/text/other_text/string2/15-some_other_txet.html
/text/other_text/25189-string3/45112-text.html
/text/other_text/text/string4/5418874-some_other_txet.html
/text/other_text/string5/text/some_other_txet-4157/text.html
/text/other_text/123-text-4571/text.html
/text/other_text/125-text-471/text.html

And I have another string dataframe column let's call it (B) (the two dataframes are different and they don't have the same number of rows).

Here's an example of my column in dataframe B:

names
string1
string11
string4
string3
string2
string10
string5
string100

What I want to do is to check if my page paths (A) are containing strings from my other dataframe (B).

I had difficulties because my two dataframes haven't the same length and the data are unorganized.

EXPECTED OUTPUT

I want to have this output as a result:

 pagePath                                                  names     exist
/text/other_text/123-string1-4571/text.html                string1   TRUE
/text/other_text/string2/15-some_other_txet.html           string2   TRUE
/text/other_text/25189-string3/45112-text.html             string3   TRUE
/text/other_text/text/string4/5418874-some_other_txet.html string4   TRUE
/text/string5/text/some_other_txet-4157/text.html          string5   TRUE
/text/other_text/123-text-4571/text.html                     NA      FALSE
/text/other_text/125-text-471/text.html                      NA      FALSE

If my question needs more clarification, please mention this.

Upvotes: 4

Views: 371

Answers (4)

Sotos
Sotos

Reputation: 51582

We can use str_extract_all from stringr package but NA are replaced with character(0) so we have to change it

df$names <- as.character(str_extract_all(df$pagePath, "string[0-9]+"))
df$exist <- df$names %in% df1$names
df[df=="character(0)"] <- NA
df
#                                                 pagePath       names   exist
#1                  /text/other_text/123-string1-4571/text.html string1  TRUE
#2             /text/other_text/string2/15-some_other_txet.html string2  TRUE
#3               /text/other_text/25189-string3/45112-text.html string3  TRUE
#4   /text/other_text/text/string4/5418874-some_other_txet.html string4  TRUE
#5 /text/other_text/string5/text/some_other_txet-4157/text.html string5  TRUE
#6                     /text/other_text/123-text-4571/text.html    <NA> FALSE
#7                      /text/other_text/125-text-471/text.html    <NA> FALSE

DATA

dput(df)
structure(list(pagePath = structure(c(1L, 5L, 4L, 7L, 6L, 2L, 
3L), .Label = c("/text/other_text/123-string1-4571/text.html", 
"/text/other_text/123-text-4571/text.html", "/text/other_text/125-text-471/text.html", 
"/text/other_text/25189-string3/45112-text.html", "/text/other_text/string2/15-some_other_txet.html", 
"/text/other_text/string5/text/some_other_txet-4157/text.html", 
"/text/other_text/text/string4/5418874-some_other_txet.html"), class = "factor")), .Names = "pagePath", class = "data.frame", row.names = c(NA, 
-7L))
dput(df1)
structure(list(names = structure(c(1L, 4L, 7L, 6L, 5L, 2L, 8L, 
3L), .Label = c("string1", "string10", "string100", "string11", 
"string2", "string3", "string4", "string5"), class = "factor")), .Names = "names", class = "data.frame", row.names = c(NA, 
-8L))

Upvotes: 2

Calga
Calga

Reputation: 43

Not that nice, since containing a for loop:

names <- rep(NA, length(A$pagePath))
exist <- rep(FALSE, length(A$pagePath))

for (name in B$names) {
  names[grep(name, A$pagePath)] <- name
  exist[grep(name, A$pagePath)] <- TRUE
}

Upvotes: 2

David Heckmann
David Heckmann

Reputation: 2939

Here is one way using apply:

df$exist <- apply( df,1,function(x){as.logical(grepl(x[2],x[1]))} )

Upvotes: 0

mtoto
mtoto

Reputation: 24178

We can generate the exist column with grepl()

# Collapse B$names into one string with "|" 
onestring <- paste(B$names, collapse = "|") 

# Generate new column
A$exist <- grepl(onestring, A$pagePath)

Upvotes: 2

Related Questions