Simon
Simon

Reputation: 325

How to extract character strings outside of square brackets using R?

How do you extract a character string for the text contained outside square brackets?

My example data:

test <- structure(list(Site = c("DavidsonSimpson", "DavidsonSimpson"), 
               Measurement = c("Depth From Measuring Point [Manual Water Level]", 
                               "HB Datum minus Depth From MP [Manual Water Level]")), 
               row.names = c(NA,-2L), class = "data.frame")

Extracting string inside bracket

test1 <- test %>% # all sites with datum "Land surface"
  mutate(Source = str_extract(Measurement, "(?<=\\[)[^]]+"))

But how do I extract the string outside the bracket??

Upvotes: 2

Views: 793

Answers (3)

moodymudskipper
moodymudskipper

Reputation: 47300

You can use {unglue} :

library(unglue)

unglue_unnest(test, Measurement, "{Source} [{}]", remove = FALSE)
#>              Site                                       Measurement
#> 1 DavidsonSimpson   Depth From Measuring Point [Manual Water Level]
#> 2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level]
#>                         Source
#> 1   Depth From Measuring Point
#> 2 HB Datum minus Depth From MP

If you'd rather keep both :

unglue_unnest(test, Measurement, "{Source1} [{Source2}]", remove = FALSE)
#>              Site                                       Measurement
#> 1 DavidsonSimpson   Depth From Measuring Point [Manual Water Level]
#> 2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level]
#>                        Source1            Source2
#> 1   Depth From Measuring Point Manual Water Level
#> 2 HB Datum minus Depth From MP Manual Water Level

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388817

You can use the same regex that you used in str_extract in str_remove to remove the words inside brackets also removing brackets.

library(dplyr)
library(stringr)

test %>% 
  mutate(Source = str_remove(Measurement, "\\[[^]]+\\]"))

#             Site                                       Measurement                        Source
#1 DavidsonSimpson   Depth From Measuring Point [Manual Water Level]   Depth From Measuring Point 
#2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level] HB Datum minus Depth From MP 

In base R you can use sub :

test$Source <- sub('\\s\\[.*\\]', '', test$Measurement)
#For this case this works as well
#test$Source <- sub('\\s\\[.*', '', test$Measurement)

Upvotes: 0

akrun
akrun

Reputation: 886978

We can use

test %>%
   dplyr::mutate(Source = str_extract(Measurement, '[^\\[]+'))
#    Site                                       Measurement                        Source
#1 DavidsonSimpson   Depth From Measuring Point [Manual Water Level]   Depth From Measuring Point 
#2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level] HB Datum minus Depth From MP 

Upvotes: 2

Related Questions