Extract string between prefix and suffix

Question

I have these columns:

                 text.NANA text.22 text.32
1    Female RNDM_MXN95.tif      No      NA
12     Male RNDM_QOS38.tif      No      NA
13  Female  RNDM_WQW90.tif      No      NA
14    Male  RNDM_BKD94.tif      No      NA
15    Male  RNDM_LGD67.tif      No      NA
16   Female RNDM_AFP45.tif      No      NA

I want to create a column that only has the barcode that starts with RNDM_ and ends with .tif, but not including .tif. The tricky part is to get rid of the gender information that is also in the same column. There are a random amount of spaces between the gender information and the RNDM_:

                 text.NANA text.22 text.32    BARCODE
1    Female RNDM_MXN95.tif      No      NA RNDM_MXN95
12     Male RNDM_QOS38.tif      No      NA RNDM_QOS38
13  Female  RNDM_WQW90.tif      No      NA RNDM_WQW90
14    Male  RNDM_BKD94.tif      No      NA RNDM_BKD94
15    Male  RNDM_LGD67.tif      No      NA RNDM_LGD67
16   Female RNDM_AFP45.tif      No      NA RNDM_AFP45

I made a very poor attempt with this, but it didn't work:

dfrm$BARCODE <- regexpr("RNDM_", dfrm$text.NANA)
# [1] 8 6 9 7 7 8 9 9 8 8 9 9 6 6 7 8 9 8
# attr(,"match.length")
# [1] 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
# attr(,"useBytes")
# [1] TRUE

Please help. Thanks!

Konrad Rudolph · Accepted Answer

So you just want to remove the file extension? Use file_path_sans_ext:

dfrm$BARCODE = file_path_sans_ext(dfrm$text.NANA)

If there’s more stuff in front, you can use the following regular expression to extract just the suffix:

dfrm$BARCODE = stringr::str_match(dfrm$text.NANA, '(RNDM_.*)\.tif')[, 2]

Note that I’m using the {stringr} package here because the base R functions for extracting regex matches are terrible. Nobody uses them.

I strongly recommend against using strsplit here because it’s underspecified: from reading the code it’s absolutely not clear what the purpose of that code is. Write code that is self-explanatory, not code that requires explanation in a comment.

Extract string between prefix and suffix

Answers (2)

Related Questions