Reputation: 1412
I am writing a bash script that I am using to detect certain classes of strings in a SQL query (like all upper-case, all lowercase, all numeric characters, etc...). Before doing the classification, I want to extract all quoted strings. I am having trouble getting a regex that will properly extract the quoted strings from the query string. For example, take this query from the TPCH benchmark:
select
o_year,
sum(case
when nation = 'JAPAN' then volume
else 0
end) / sum(volume) as mkt_share
from
(
select
extract(year from o_orderdate) as o_year,
l_extendedprice * (1 - l_discount) as volume,
n2.n_name as nation
from
part,
supplier,
lineitem,
orders,
customer,
nation n1,
nation n2,
region
where
p_partkey = l_partkey
and s_suppkey = l_suppkey
and l_orderkey = o_orderkey
and o_custkey = c_custkey
and c_nationkey = n1.n_nationkey
and n1.n_regionkey = r_regionkey
and r_name = 'ASIA'
and s_nationkey = n2.n_nationkey
and o_orderdate between date '1995-01-01' and date '1996-12-31'
and p_type = 'MEDIUM BRUSHED BRASS'
) as all_nations
group by
o_year
order by
o_year;
Its a complex query, but that is besides the point. I need to be able to extract all of the single-quoted strings from this file and print them on their own line. ie:
'JAPAN'
'ASIA'
'1995-01-01'
'1996-12-31'
'MEDIUM BRUSHED BRASS'
Right now, (being that I'm not very familiar with regex) all I have is:
printf '%s\n' $SQL_FILE_VARIABLE | grep -E "'*'"
But this doesn't support strings with spaces, and it doesn't work when multiple strings are on the same line of the file. Ideally, I can get this to work in my bash script, so preferably the solution will be grep/sed/perl. I have done some googling and have found solutions to similar problems, but I have not been able to get them to work for this in particular.
Any Ideas how I can achieve this? Thanks.
Upvotes: 0
Views: 115
Reputation: 3156
Why not try /'(.*)?'/g
This means, between the quotes, match everything and extract it.
Upvotes: 0
Reputation: 27577
You want something like this:
printf '%s\n' $SQL_FILE_VARIABLE | grep -E "'[^']*'"
Upvotes: 2