maximusdooku
maximusdooku

Reputation: 5512

How can I extract a string rowwise using regex?

I have a column filename in a dataframe that looks like this:

/testData/THQ/TAIRATE.20030314.190000.tif
/testData/THQ/TAIRATE.20030314.200000.tif
/testData/THQ/TAIRATE.20030314.210000.tif
/testData/THQ/TAIRATE.20030314.220000.tif

And I want to extract the timestamp from this and store it as another column. But I am not familiar with Regex. So far I have gotten to this:

tdat %>%
  dplyr::rowwise() %>% 
  dplyr::mutate(timestamp = str_extract(as.character(filename), "[^//TAIRATE]+$")) %>% 
  glimpse()

Result

.20030314.190000.tif
.20030314.200000.tif
.20030314.210000.tif
.20030314.220000.tif

Expected result

20030314190000
20030314200000
20030314210000
20030314220000

Question: How can I write the correct regex or is there a better way?

Upvotes: 0

Views: 211

Answers (2)

Chris Ruehlemann
Chris Ruehlemann

Reputation: 21400

Certainly less elegant than @akrun's solution but this one works too:

paste0(unlist(str_extract_all(filename, "[0-9]+")), collapse = "")

Data:

filename <- "/testData/THQ/TAIRATE.20030314.190000.tif"

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

str_extract and other such functions are vectorized you don't need row-wise.

In this case, you can do this in base R using sub.

sub('.*TAIRATE\\.(\\d+)\\.(\\d+).*', '\\1\\2', df$filename)
#[1] "20030314190000" "20030314200000" "20030314210000" "20030314220000"

Upvotes: 1

Related Questions