Reputation: 49
I have a dataframe called mydf, simplified as below:
mydf
var1 var2
abc_color1_location1_number1 1000
xyz_color1_location1_number1 100
asd_color2_location2_number1 900
qwe_color1_location1_number2 200
sdf_color2_location1_number2 1100
qwerrrr_ahjkkk_asdfgggg 234
sdf_color1_location2_number1 3577
abc_color1_location3_number1 86544
I want to subset the dataset flexibly based on var1 For example:
pattern <- c("abc", "color1", "number1")
newmydf <- mydf[grep(paste("_",paste(pattern,collapse="_|_"),"_",sep=""),mydf$var1,ignore.case=T),]
My expected result:
newmydf
var1 var2
abc_color1_location1_number1 1000
However, the resulted dataframe was only being subset with pattern "abc" and "color1" only, while I want all patterns should be considered. Can anyone please help me in this case?
Many thanks in advance!
With kind regards,
Upvotes: 1
Views: 50
Reputation: 13591
An alternative approach is to strsplit
on _
and use all(... %in% ...)
keep <- sapply(strsplit(mydf$var1, "_"), function(x) all(pattern %in% x))
df[keep,]
Output
var1 var2
1 abc_color1_location1_number1 1000
8 abc_color1_location3_number1 86544
Data
df <- structure(list(var1 = c("abc_color1_location1_number1", "xyz_color1_location1_number1",
"asd_color2_location2_number1", "qwe_color1_location1_number2",
"sdf_color2_location1_number2", "qwerrrr_ahjkkk_asdfgggg", "sdf_color1_location2_number1",
"abc_color1_location3_number1"), var2 = c(1000L, 100L, 900L,
200L, 1100L, 234L, 3577L, 86544L)), .Names = c("var1", "var2"
), class = "data.frame", row.names = c(NA, -8L))
pattern <- c("abc", "color1", "number1")
Upvotes: 0
Reputation: 39174
A solution uses tidyverse
and stringr
. mydf2
is the final output.
The find_match
is a user-defined function, which can return a vecotr with TRUE
or FALSE
to see if all the words in pattern
are found.
By applying the find_match
function, we can filter
the data frame based on the results.
library(tidyverse)
library(stringr)
find_match <- function(Col, pattern){
m <- map(pattern, str_detect, string = Col)
names(m) <- paste("Word", pattern)
m2 <- as_data_frame(m)
results <- rowSums(m2) == length(pattern)
return(results)
}
mydf2 <- mydf %>% filter(find_match(var1, pattern))
mydf2
var1 var2
1 abc_color1_location1_number1 1000
2 abc_color1_location3_number1 86544
# Create mydf
mydf <- read.table(text = "var1 var2
abc_color1_location1_number1 1000
xyz_color1_location1_number1 100
asd_color2_location2_number1 900
qwe_color1_location1_number2 200
sdf_color2_location1_number2 1100
qwerrrr_ahjkkk_asdfgggg 234
sdf_color1_location2_number1 3577
abc_color1_location3_number1 86544",
header = TRUE, stringsAsFactors = FALSE)
# Define the pattern
pattern <- c("abc", "color1", "number1")
Upvotes: 0
Reputation: 37889
If you want all the elements of pattern
to be considered, then something like this might help:
pattern <- c("abc", "color1", "number1")
alltrue <- rowSums(sapply(pattern, function(x) grepl(pattern = x, mydf$var1))) == 3
mydf[alltrue, ]
# var1 var2
#1 abc_color1_location1_number1 1000
#8 abc_color1_location3_number1 86544
Essentially sapply
will run grepl
for each one of the pattern elements and then only use those ones where all grepls are TRUE
.
Upvotes: 2