Reputation: 229
I have a dataframe column containing page paths (let's call it A):
pagePath
/text/other_text/123-string1-4571/text.html
/text/other_text/string2/15-some_other_txet.html
/text/other_text/25189-string3/45112-text.html
/text/other_text/text/string4/5418874-some_other_txet.html
/text/other_text/string5/text/some_other_txet-4157/text.html
/text/other_text/123-text-4571/text.html
/text/other_text/125-text-471/text.html
And I have another string dataframe column let's call it (B) (the two dataframes are different and they don't have the same number of rows).
Here's an example of my column in dataframe B:
names
string1
string11
string4
string3
string2
string10
string5
string100
What I want to do is to check if my page paths (A) are containing strings from my other dataframe (B).
I had difficulties because my two dataframes haven't the same length and the data are unorganized.
EXPECTED OUTPUT
I want to have this output as a result:
pagePath names exist
/text/other_text/123-string1-4571/text.html string1 TRUE
/text/other_text/string2/15-some_other_txet.html string2 TRUE
/text/other_text/25189-string3/45112-text.html string3 TRUE
/text/other_text/text/string4/5418874-some_other_txet.html string4 TRUE
/text/string5/text/some_other_txet-4157/text.html string5 TRUE
/text/other_text/123-text-4571/text.html NA FALSE
/text/other_text/125-text-471/text.html NA FALSE
If my question needs more clarification, please mention this.
Upvotes: 4
Views: 371
Reputation: 51582
We can use str_extract_all
from stringr
package but NA
are replaced with character(0)
so we have to change it
df$names <- as.character(str_extract_all(df$pagePath, "string[0-9]+"))
df$exist <- df$names %in% df1$names
df[df=="character(0)"] <- NA
df
# pagePath names exist
#1 /text/other_text/123-string1-4571/text.html string1 TRUE
#2 /text/other_text/string2/15-some_other_txet.html string2 TRUE
#3 /text/other_text/25189-string3/45112-text.html string3 TRUE
#4 /text/other_text/text/string4/5418874-some_other_txet.html string4 TRUE
#5 /text/other_text/string5/text/some_other_txet-4157/text.html string5 TRUE
#6 /text/other_text/123-text-4571/text.html <NA> FALSE
#7 /text/other_text/125-text-471/text.html <NA> FALSE
DATA
dput(df)
structure(list(pagePath = structure(c(1L, 5L, 4L, 7L, 6L, 2L,
3L), .Label = c("/text/other_text/123-string1-4571/text.html",
"/text/other_text/123-text-4571/text.html", "/text/other_text/125-text-471/text.html",
"/text/other_text/25189-string3/45112-text.html", "/text/other_text/string2/15-some_other_txet.html",
"/text/other_text/string5/text/some_other_txet-4157/text.html",
"/text/other_text/text/string4/5418874-some_other_txet.html"), class = "factor")), .Names = "pagePath", class = "data.frame", row.names = c(NA,
-7L))
dput(df1)
structure(list(names = structure(c(1L, 4L, 7L, 6L, 5L, 2L, 8L,
3L), .Label = c("string1", "string10", "string100", "string11",
"string2", "string3", "string4", "string5"), class = "factor")), .Names = "names", class = "data.frame", row.names = c(NA,
-8L))
Upvotes: 2
Reputation: 43
Not that nice, since containing a for loop:
names <- rep(NA, length(A$pagePath))
exist <- rep(FALSE, length(A$pagePath))
for (name in B$names) {
names[grep(name, A$pagePath)] <- name
exist[grep(name, A$pagePath)] <- TRUE
}
Upvotes: 2
Reputation: 2939
Here is one way using apply:
df$exist <- apply( df,1,function(x){as.logical(grepl(x[2],x[1]))} )
Upvotes: 0
Reputation: 24178
We can generate the exist
column with grepl()
# Collapse B$names into one string with "|"
onestring <- paste(B$names, collapse = "|")
# Generate new column
A$exist <- grepl(onestring, A$pagePath)
Upvotes: 2