Reputation: 29
I have a string, basically it's a SQL statement. I want to extract some part of it. Here is the code
SELECT
DTE as "Date",
CURRENT_DATE AS "Day",
concat( BCCO, BCBCH ) AS "client/batch",
BCSTAT as "Batch Status",
CASE
WHEN EXC = 'MCR' THEN CNT
ELSE 0
END AS "MCR-NPR",
CASE
WHEN EXC = 'NRC' THEN CNT
ELSE 0
END AS "NRC-NPR",
CASE
WHEN EXC = 'OFD' THEN CNT
ELSE 0
END AS "OFD-NPR",
CASE
WHEN EXC = 'TDB' THEN CNT
ELSE 0
END AS "TDB-NPR",
CASE
WHEN EXC = 'TDC' THEN CNT
ELSE 0
END AS "TDC-NPR",
CASE
WHEN EXC = 'UDC' THEN CNT
ELSE 0
END AS "UDC-NPR",
CASE
WHEN EXC = 'BIN' THEN CNT
ELSE 0
END AS "BIN-WRN",
CASE
WHEN EXC = 'DSP' THEN CNT
ELSE 0
END AS "DSP-WRN",
I want to extract every element between END AS and the quote. A vector like ("MCR-NPR",...,"DSP-WRN") will be the desire output.
I know I may need to use regular expression, but I couldn't extract every one of them.
Any idea will be appreciated.
Best,
Upvotes: 1
Views: 90
Reputation: 270248
1) grep/read.table grep
out lines with END AS
and use read.table
with a sep
of double quote to read those. The second column will be the desired data. No regular expressions or packages are used.
read.table(text = grep("END AS", s, value = TRUE, fixed = TRUE),
sep = '"', as.is = TRUE)[[2]]
## [1] "MCR-NPR" "NRC-NPR" "OFD-NPR" "TDB-NPR" "TDC-NPR" "UDC-NPR" "BIN-WRN"
## [8] "DSP-WRN"
1a) This is similar to (1) but uses sub
with a regular expression instead of read.table
:
sub('.*END AS "(.+)".*', "\\1", grep("END AS", s, value = TRUE))
## [1] "MCR-NPR" "NRC-NPR" "OFD-NPR" "TDB-NPR" "TDC-NPR" "UDC-NPR" "BIN-WRN"
## [8] "DSP-WRN"
2) strapply Another approach is the following. It makes use of the fact that the desired strings follow END AS and are surrounded with double quotes It has the shortest code of the ones shown here.
library(gsubfn)
unlist(strapplyc(s, 'END AS "(.+)"'))
## [1] "MCR-NPR" "NRC-NPR" "OFD-NPR" "TDB-NPR" "TDC-NPR" "UDC-NPR" "BIN-WRN"
## [8] "DSP-WRN"
3) strcapture Another base R approach using the same pattern as in (2) is:
na.omit(strcapture('END AS "(.+)"', s, list(value = character(0))))
giving:
value
9 MCR-NPR
13 NRC-NPR
17 OFD-NPR
21 TDB-NPR
25 TDC-NPR
29 UDC-NPR
33 BIN-WRN
37 DSP-WRN
The input s
in reproducible form:
s <-
c("SELECT ", " DTE as \"Date\",", " CURRENT_DATE AS \"Day\",",
" concat( BCCO, BCBCH ) AS \"client/batch\",", " BCSTAT as \"Batch Status\",",
" CASE ", " WHEN EXC = 'MCR' THEN CNT ", " ELSE 0 ", " END AS \"MCR-NPR\",",
" CASE ", " WHEN EXC = 'NRC' THEN CNT ", " ELSE 0 ", " END AS \"NRC-NPR\",",
" CASE ", " WHEN EXC = 'OFD' THEN CNT ", " ELSE 0 ", " END AS \"OFD-NPR\",",
" CASE ", " WHEN EXC = 'TDB' THEN CNT ", " ELSE 0 ", " END AS \"TDB-NPR\",",
" CASE ", " WHEN EXC = 'TDC' THEN CNT ", " ELSE 0 ", " END AS \"TDC-NPR\",",
" CASE ", " WHEN EXC = 'UDC' THEN CNT ", " ELSE 0 ", " END AS \"UDC-NPR\",",
" CASE ", " WHEN EXC = 'BIN' THEN CNT ", " ELSE 0 ", " END AS \"BIN-WRN\",",
" CASE ", " WHEN EXC = 'DSP' THEN CNT ", " ELSE 0 ", " END AS \"DSP-WRN\"")
Upvotes: 2