Reputation: 325
How do you extract a character string for the text contained outside square brackets?
My example data:
test <- structure(list(Site = c("DavidsonSimpson", "DavidsonSimpson"),
Measurement = c("Depth From Measuring Point [Manual Water Level]",
"HB Datum minus Depth From MP [Manual Water Level]")),
row.names = c(NA,-2L), class = "data.frame")
Extracting string inside bracket
test1 <- test %>% # all sites with datum "Land surface"
mutate(Source = str_extract(Measurement, "(?<=\\[)[^]]+"))
But how do I extract the string outside the bracket??
Upvotes: 2
Views: 793
Reputation: 47300
You can use {unglue} :
library(unglue)
unglue_unnest(test, Measurement, "{Source} [{}]", remove = FALSE)
#> Site Measurement
#> 1 DavidsonSimpson Depth From Measuring Point [Manual Water Level]
#> 2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level]
#> Source
#> 1 Depth From Measuring Point
#> 2 HB Datum minus Depth From MP
If you'd rather keep both :
unglue_unnest(test, Measurement, "{Source1} [{Source2}]", remove = FALSE)
#> Site Measurement
#> 1 DavidsonSimpson Depth From Measuring Point [Manual Water Level]
#> 2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level]
#> Source1 Source2
#> 1 Depth From Measuring Point Manual Water Level
#> 2 HB Datum minus Depth From MP Manual Water Level
Upvotes: 1
Reputation: 388817
You can use the same regex that you used in str_extract
in str_remove
to remove the words inside brackets also removing brackets.
library(dplyr)
library(stringr)
test %>%
mutate(Source = str_remove(Measurement, "\\[[^]]+\\]"))
# Site Measurement Source
#1 DavidsonSimpson Depth From Measuring Point [Manual Water Level] Depth From Measuring Point
#2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level] HB Datum minus Depth From MP
In base R you can use sub
:
test$Source <- sub('\\s\\[.*\\]', '', test$Measurement)
#For this case this works as well
#test$Source <- sub('\\s\\[.*', '', test$Measurement)
Upvotes: 0
Reputation: 886978
We can use
test %>%
dplyr::mutate(Source = str_extract(Measurement, '[^\\[]+'))
# Site Measurement Source
#1 DavidsonSimpson Depth From Measuring Point [Manual Water Level] Depth From Measuring Point
#2 DavidsonSimpson HB Datum minus Depth From MP [Manual Water Level] HB Datum minus Depth From MP
Upvotes: 2