Steve
Steve

Reputation: 333

Find rows with regex

In R, I am trying this:

  1. with "Item_List" table, search "Description" column which has: 1a. the word "BTE", and 2b. the words "RIC" or "RITC" or both
  2. I want index (row numbers) of these rows.
  3. create a new column "Review1", enter "Yes" in rows identified by step 2 above.

Here is my data file:

https://drive.google.com/file/d/1uUlCf9LaU97HcHc1ehbbJ9yLjSn94n29/view?usp=sharing

library(readxl)
library(stringr)

I have this clumsy codes (they works):

match1 <- which(str_detect(Item_List$Description, "BTE"))
match2 <- which(str_detect(Item_List$Description, "(RITC)|(RIC)"))
match_result <- intersect(match1, match2)
Item_List$Review1 <- ""
Item_List$Review1[match_result] <- "Yes"

But my questions are:

A. Can regex be in one line of code, instead of "match1" and "match2" above?

B. why pipes does not work below?

Item_List %>%
  which(str_detect(pattern = "BTE"))

Upvotes: 0

Views: 57

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388817

You can combine match1 and match2 in one line as :

library(dplyr)
library(stringr)

Item_List %>%
  mutate(Review1 = if_else(str_detect(Description,'BTE.*(RITC|RIC)|(RITC|RIC).*BTE'),
                           'Yes', ''))

#  Product     `Product Code` Manufacturer Description                             Review1
#   <chr>       <chr>          <chr>        <chr>                                   <chr>  
# 1 Hearing Aid 11516117       WIDEX        BEYOND 220 FUSION BTE RITC Digital      "Yes"  
# 2 Hearing Aid 11516116       WIDEX        BEYOND 330 FUSION BTE RITC Digital      "Yes"  
# 3 Hearing Aid 11516114       WIDEX        BEYOND 440 FUSION BTE RITC Digital      "Yes"  
# 4 Hearing Aid 11522324       WIDEX        EVOKE 110 BTE 13 D BTE Standard Digital ""     
# 5 Hearing Aid 11912651       WIDEX        EVOKE 110 CIC CIC Digital               ""     
# 6 Hearing Aid 11912682       WIDEX        EVOKE 110 E-IM HS Half Shell Digital    ""     
# 7 Hearing Aid 11912678       WIDEX        EVOKE 110 E-IM ITC Canal Digital        ""     
# 8 Hearing Aid 11912650       WIDEX        EVOKE 110 E-IM ITE Full Shell Digital   ""     
# 9 Hearing Aid 11912674       WIDEX        EVOKE 110 E-IP HS Half Shell Digital    ""     
#10 Hearing Aid 11912670       WIDEX        EVOKE 110 E-IP ITC Canal Digital        ""     
# … with 186 more rows

which in base R can be written as :

transform(Item_List, Review1 = ifelse(grepl('BTE.*(RITC|RIC)|(RITC|RIC).*BTE', 
                                      Description), 'Yes', ''))

For the second question,

Item_List %>% which(str_detect(pattern = "BTE"))

This does not work because you are passing a dataframe (Item_List) to which. You could do :

Item_List %>%
  pull(Description) %>%
  str_detect(pattern = "BTE") %>%
  which()

Upvotes: 1

David J. Bosak
David J. Bosak

Reputation: 1624

Yes, this can all be done in one step with the grepl() function.


Item_List$Review1 <- grepl("(?=.*BTE)(?=.*RIC|.*RITC)", Item_List$Description, perl = TRUE)

Upvotes: 1

Related Questions