Reputation: 411
I have a string:
string <- "newdatat.scat == \"RDS16\" ~ \"Asthma\","
and I want to extract separately:
RDS16
Asthma
What I've tried so far is:
extract <- str_extract(string,'~."(.+)')
but I am only able to get:
~ \"Asthma\",
If you have an answer, can you also kindly explain the regex behind it? I'm having a hard time converting string patterns to regex.
Upvotes: 0
Views: 202
Reputation: 389135
You can capture the two values in two separate columns.
In stringr
use str_match
-
string <- "newdatat.scat == \"RDS16\" ~ \"Asthma\","
stringr::str_match(string, '"(\\w+)" ~ "(\\w+)"')[, -1, drop = FALSE]
# [,1] [,2]
#[1,] "RDS16" "Asthma"
Or in base R use strcapture
strcapture('"(\\w+)" ~ "(\\w+)"', string,
proto = list(col1 = character(), col2 = character()))
# col1 col2
#1 RDS16 Asthma
Upvotes: 0
Reputation: 5788
Base R solutions:
# Solution 1:
# Extract strings (still quoted):
# dirtyStrings => list of strings
dirtyStrings <- regmatches(
string,
gregexpr(
'".*?"',
string
)
)
# Iterate over the list and "clean" - unquote - each
# element, store as a vector: result => character vector
result <- c(
vapply(
dirtyStrings,
function(x){
noquote(
gsub(
'"',
'',
x
)
)
},
character(
lengths(
dirtyStrings
)
)
)
)
# Solution 2:
# Same as above, less generic -- assumes all strings
# will follow the same pattern: result => character vector
result <- unlist(
lapply(
strsplit(
gsub(
".*\\=\\=",
"",
noquote(
string
)
),
"~"),
function(x){
gsub(
"\\W+",
"",
noquote(x)
)
}
)
)
Upvotes: 0
Reputation: 15072
If you just need to extract sections surrounded by "
, then you can use something like the following. The pattern ".*?"
matches first "
, then .*?
meaning as few characters as possible, before finally matching another "
. This will get you the strings including the "
double quotes; you then just have to remove the double quotes to clean up.
Note that str_extract_all
is used to return all matches, and that it returns a list of character vectors so we need to index into the list before removing the double quotes.
library(stringr)
string <- "newdatat.scat == \"RDS16\" ~ \"Asthma\","
str_extract_all(string, '".*?"') %>%
`[[`(1) %>%
str_remove_all('"')
#> [1] "RDS16" "Asthma"
Created on 2021-06-21 by the reprex package (v1.0.0)
Upvotes: 3