Reputation: 3384
NAMES = data.table(names1 = c("John","mary","james","Mark","abi"),
list1 = c("bam","cameron","tony","lucas","tim"))
NAMES_SUBSET = NAMES[names1 %like% "ma" | list1 %like% "am"]
I have a data.table 'NAMES' that contains various column names; all the columns contains a list of names that may or may not have proper capitalization, for example some first letters are uppercase but some are not.
I want to create 'NAMES_SUBSET' that ONLY contains the rows in 'NAMES' that contain "ma" or "Ma" or "am" or "Am" in any of the columns.
My current attempt is not a good solution, because I do not know how to search the data.table for all rows but only one columns at a time and I do not know how to make it case insensitive. Thank you
My hopeful output is:
names1 list1
John bam
mary cameron
james tony
Mark lucas
Upvotes: 0
Views: 144
Reputation: 1844
I would use something like this:
desired_rows <- NAMES[, lapply(.SD, function(x) grepl("ma|am", x, ignore.case = TRUE))]
NAMES_SUBSET <- NAMES[rowSums(desired_rows) > 0]
The first line uses .SD
to look through all the columns, and grepl()
to search for matches to either of your patterns, with the ignore.case = TRUE
argument.
The second line uses that to filter your original dataset.
Output:
> NAMES_SUBSET
names1 list1
1: John bam
2: mary cameron
3: james tony
4: Mark lucas
If you're doing this a lot, you can write it as a function like this:
search_all_dtcols <- function(dataset, pattern) {
dt <- dataset[, lapply(.SD, function(x) grepl(pattern, x, ignore.case = TRUE))]
dataset[rowSums(dt) > 0]
}
Output:
> search_all_dtcols(NAMES, "ma|am")
names1 list1
1: John bam
2: mary cameron
3: james tony
4: Mark lucas
Upvotes: 1