Reputation: 982
Based on this answer, how can we list the results in a more compact single column, in case we are matching many patterns but expect to get only few hits per string? (I am not sure of the most orthodox format for the "hits" column, whether a vector as below, or a delimited string.)
streets = c("Berberichweg", "Otto-Klemperer-Weg", "Feldmeierbogen" , "Altostraße")
streets = tolower(streets) #Lowercase all
names = c("Berber", "Weg")
names = tolower(names)
#The original solution and output
sapply(names, function (y) sapply(streets, function (x) grepl(y, x)))
# berber weg
#berberichweg TRUE TRUE
#otto-klemperer-weg FALSE TRUE
#feldmeierbogen FALSE FALSE
#altostraße FALSE FALSE
#The desired output instead
#streets hits
#berberichweg c("berber", "weg")
#otto-klemperer-weg "weg"
#feldmeierbogen NA
#altostraße NA
Upvotes: 0
Views: 53
Reputation: 160607
res <- sapply(names, function (y) sapply(streets, function (x) grepl(y, x)))
res
# berber weg
# berberichweg TRUE TRUE
# otto-klemperer-weg FALSE TRUE
# feldmeierbogen FALSE FALSE
# altostraße FALSE FALSE
dat <- data.frame(streets = streets)
dat$hits1 <- names[apply(res, 1, function(z) if (any(z)) which.max(z) else NA)]
dat
# streets hits1
# 1 berberichweg berber
# 2 otto-klemperer-weg weg
# 3 feldmeierbogen <NA>
# 4 altostraße <NA>
dat$hits1
# [1] "berber" "weg" NA NA
If instead you want one string per result, perhaps
dat$hits2 <- apply(res, 1, function(z) toString(names(which(z))))
dat
# streets hits1 hits2
# 1 berberichweg berber berber, weg
# 2 otto-klemperer-weg weg weg
# 3 feldmeierbogen <NA>
# 4 altostraße <NA>
dat$hits2
# [1] "berber, weg" "weg" "" ""
Noting that the first is a single comma-delimited string, not a vector of strings. An alternative would be to use a list-column instead,
dat$hits3 <- apply(res, 1, function(z) names(which(z)))
dat
# streets hits1 hits2 hits3
# 1 berberichweg berber berber, weg berber, weg
# 2 otto-klemperer-weg weg weg weg
# 3 feldmeierbogen <NA>
# 4 altostraße <NA>
dat$hits3
# $berberichweg
# [1] "berber" "weg"
# $`otto-klemperer-weg`
# [1] "weg"
# $feldmeierbogen
# character(0)
# $altostraße
# character(0)
This is a list
, which can be assigned into a frame. Two things to note about this:
You'll need to use [[
to grab a single "cell" from this hits3
:
dat$hits1[1]
# [1] "berber"
dat$hits2[1]
# [1] "berber, weg"
dat$hits3[1]
# $berberichweg # <---- this is a list, not a vector, of length 1
# [1] "berber" "weg"
dat$hits3[[1]]
# [1] "berber" "weg"
Anything that works on this column will need to be list
-friendly, since it is not a vector.
Upvotes: 3