Reputation: 345
I have a data frame like so:
| X | Y
----------
| 8 | 8 13
| 1 | 8 13
| 5 | 8 13
| 13 | 8 13
| 3 | 8 13
| 14 | 7
| 14 |
Col X is in integer format. Col Y is in character format, the numbers are separated by a space. This column can have one number (single or double digit), up to 20 numbers, all separated by a space, or it can be blank.
I wish to match the number in Col X with any of the number in Col Y, like so:
| X | Y | FLAG
---------------
| 8 | 8 13 | Yes <-- 8 is a match
| 1 | 8 13 | No <-- Even though 1 appears in Col Y, it is technically number 13
| 5 | 8 13 | No
| 13 | 8 13 | Yes
| 3 | 8 13 | No <-- Even though 3 appears in Col Y, it is technically number 13
| 14 | 7 | No
| 14 | | No
I tried to unlist the numbers like so:
as.numeric(unlist(str_extract_all(data$colY, "[\\.0-9e-]+")
But this only separates the numbers. How can this be achieved?
Upvotes: 2
Views: 410
Reputation: 887291
We could use str_detect
which is Vectorized
over the 'string' and 'pattern'. Inorder to make sure no partial matches, paste
word boundary (\\b
) on the 'X' column
library(stringr)
library(dplyr)
df1 %>%
mutate(FLAG = c("No", "Yes")[1 + str_detect(Y, str_c("\\b", X, "\\b"))])
-output
# X Y FLAG
#1 8 8 13 Yes
#2 1 8 13 No
#3 5 8 13 No
#4 13 8 13 Yes
#5 3 8 13 No
#6 14 7 No
#7 14 No
In base R
, we could use read.table
to read as numeric columns and then with rowSums
create a logical vector
c("No", "Yes")[1 + (rowSums(read.table(text = df1$Y, header = FALSE,
fill = TRUE) == df1$X, na.rm = TRUE) > 0)]
df1 <- structure(list(X = c(8, 1, 5, 13, 3, 14, 14), Y = c("8 13", "8 13",
"8 13", "8 13", "8 13", "7", "")), class = "data.frame", row.names = c(NA,
-7L))
Upvotes: 2
Reputation: 93908
strsplit
then check if the value is in the corresponding list
. Let's remake the data first:
dat <- data.frame(X=c(8L, 1L, 5L, 13L, 3L, 14L),
Y=c("8 13", "8 13", "8 13", "8 13", "8 13", "7"), stringsAsFactors=FALSE)
Then do the comparison:
mapply(`%in%`, dat$X, strsplit(dat$Y, "\\s+"))
#[1] TRUE FALSE FALSE TRUE FALSE FALSE
Upvotes: 3