Darren
Darren

Reputation: 71

How to Ignore capital using lapply(str_subset)

I am trying to create a new column (D$NEW) in Data.table D which matches each row of D to a whole column (D2$COLUMN1) in Data.table D2 using str_subset. (My data structure is at the bottom)

D[,NEW:= lapply(D[,C1],function(x)str_subset(as.character(D2$COLUMN1), x)]

This works fine. But I also want str_subset to ignore capital case. But when I use ignore.case(x)

D[,NEW:= lapply(D[,C1],function(x)str_subset(as.character(D2$COLUMN1), ignore.case(x))]

I get the following error

## PLEASE use (fixed|coll|regexp)(x, ignore_case=TRUE)

When I use ignore_case=TRUE

D[,F:= lapply(D[,V1],function(x) str_subset(as.character(D2$COLUMN1), x, ignore_case=TRUE))]

I get the following error:

Error in str_subset(as.character(), x, ignore_case = TRUE) : unused argument (ignore_case = TRUE)

How can I manage to force to ignore cases while using this function..

Data:

D<-data.table(C1=c("a","b","c","d","e","A","B","C"), C2=c(1,2,3,4,5,6,7,8,9,10))


D2<-data.table(COLUMN1=c("a"), COLUMN2=c("b"), COLUMN3=c(1:10))

Upvotes: 1

Views: 408

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627536

The first error tells you that you cannot use an ignore.case() as a function. The second error is related to the fact that the str_subset function does not seem to have any ignore_case argument.

Use an inline case-insensitive modifier (?i):

D[,NEW:= lapply(D[,C1],function(x)str_subset(as.character(D2$COLUMN1), paste0("(?i)",x)))]
                                                                       ^^^^^^^^^^^^^^^^

The inline case-insensitive modifier (?i) does the same that as ignore.case / ignore_case are doing. It makes matching case-insensitive. See more details on inline modifiers at regular-expressions.info. When placed at some place of the pattern, the part after it matches the string in a case-insensitive way. So, by placing it at the start of the pattern, you make the whole pattern case-insensitive.

Else, you may pass the TRUE to the regex function:

D[,NEW:= lapply(D[,C1],function(x)str_subset(as.character(D2$COLUMN1), regex(x, TRUE)))]
                                                                       ^^^^^^^^^^^^^^

The TRUE is the value of the ignore_case argument (you may write it as regex(x, ignore_case=TRUE)). See more details on the options you may use in the stri_opts_regex section here. For some reason, the case_insensitive=TRUE does not work. I got an error:

Error in stri_opts_regex(case_insensitive = ignore_case, multiline = multiline, :
   formal argument case_insensitive matched by multiple actual arguments

So, I had to replace it with ignore_case.

Result:

> D
    C1 C2          NEW
 1:  a  1 a,a,a,a,a,a,
 2:  b  2             
 3:  c  3             
 4:  d  4             
 5:  e  5             
 6:  A  6 a,a,a,a,a,a,
 7:  B  7             
 8:  C  8             
 9:  a  9 a,a,a,a,a,a,
10:  b 10    

Upvotes: 1

Related Questions