bvowe
bvowe

Reputation: 3384

Data Table Solution To Name Search

NAMES = data.table(names1 = c("John","mary","james","Mark","abi"), 
                   list1 = c("bam","cameron","tony","lucas","tim"))
NAMES_SUBSET = NAMES[names1 %like% "ma" | list1 %like% "am"]

I have a data.table 'NAMES' that contains various column names; all the columns contains a list of names that may or may not have proper capitalization, for example some first letters are uppercase but some are not.

I want to create 'NAMES_SUBSET' that ONLY contains the rows in 'NAMES' that contain "ma" or "Ma" or "am" or "Am" in any of the columns.

My current attempt is not a good solution, because I do not know how to search the data.table for all rows but only one columns at a time and I do not know how to make it case insensitive. Thank you

My hopeful output is:

names1  list1
John    bam
mary    cameron
james   tony
Mark    lucas

Upvotes: 0

Views: 144

Answers (1)

Jaccar
Jaccar

Reputation: 1844

I would use something like this:

desired_rows <- NAMES[, lapply(.SD, function(x) grepl("ma|am", x, ignore.case = TRUE))] 
NAMES_SUBSET <- NAMES[rowSums(desired_rows) > 0]

The first line uses .SD to look through all the columns, and grepl() to search for matches to either of your patterns, with the ignore.case = TRUE argument.

The second line uses that to filter your original dataset.

Output:

> NAMES_SUBSET
   names1   list1
1:   John     bam
2:   mary cameron
3:  james    tony
4:   Mark   lucas

If you're doing this a lot, you can write it as a function like this:

search_all_dtcols <- function(dataset, pattern) {
  dt <- dataset[, lapply(.SD, function(x) grepl(pattern, x, ignore.case = TRUE))] 
  dataset[rowSums(dt) > 0]
}

Output:

> search_all_dtcols(NAMES, "ma|am")
   names1   list1
1:   John     bam
2:   mary cameron
3:  james    tony
4:   Mark   lucas

Upvotes: 1

Related Questions