Reputation: 4738
I have a field that has the SDK name and version and it is not standardized so that I can easily extract just the name and the version. These are contrived values but represent the possible values I am working with:
As you can see, the use of "/" is not consistent and I need a way to parse the alpha from the numeric consistently so that I end up with two columns like this:
JavaScript 2.3.4
JavaScript 4.3.1
Android 4.6.5
Android 3.2.1
Swift 4.5.3
Swift 3.1.1.5
I have searched for different ways to do this but nothing I have found seems to do what I required.
Ultimately, I need to put this into a Postgres SELECT statement like this:
I think the DISTINCT keyword is unneeded and confusing. It was originally part of a COUNT/GROUP BY query, but for simplicity sake, I just want to list all rows with the three columns: sdk, sdk_name and sdk_version separately. From there I will use the best parsing formula from the answers to do as I require.
SELECT sdk, [parse sdk name formula] as "sdk_name", [parse sdk version formula] as "sdk_version"
Furthermore, I don't have a fixed list of SDKs to provide in the query so I am not sure the with/as/values
strategy works for me but something I was not aware of and looks useful. I suppose the with/values can just be another SELECT query, though.
Upvotes: 4
Views: 2582
Reputation: 121919
Use the regex function substring()
:
with my_data(sdk) as (
values
('JavaScript/2.3.4'),
('JavaScript/4.3.1'),
('Android4.6.5'),
('Android3.2.1'),
('Swift4.5.3'),
('Swift/3.1.1.5')
)
select
substring(sdk from '[^\d/]*') as sdk_name,
substring(sdk from '\d.*') as sdk_version
from my_data
sdk_name | sdk_version
------------+-------------
JavaScript | 2.3.4
JavaScript | 4.3.1
Android | 4.6.5
Android | 3.2.1
Swift | 4.5.3
Swift | 3.1.1.5
(6 rows)
Update.
You can place your select
query in the with
part (instead of values
):
with my_data(sdk) as (
<select sdk from ...>
)
select
substring(sdk from '[^\d/]*') as sdk_name,
substring(sdk from '\d.*') as sdk_version
from my_data
or in the from
clause:
select
substring(sdk from '[^\d/]*') as sdk_name,
substring(sdk from '\d.*') as sdk_version
from (
<select sdk from ...>
) my_data
Upvotes: 6
Reputation: 32384
Regular expression parsing is quite compute-intensive, so rather than using two function calls (as in the other answers), you should combine them into a single call and then extract the desired values from the result:
WITH d(sdk) AS (
VALUES
('JavaScript/2.3.4'),
('JavaScript/4.3.1'),
('Android4.6.5'),
('Android3.2.1'),
('Swift4.5.3'),
('Swift/3.1.1.5'),
('C#/23.1') )
SELECT unq.sdk, re.match[1] AS sdk_name, re.match[2] AS sdk_version
FROM (SELECT DISTINCT sdk FROM d) unq,
regexp_match(unq.sdk, '([^0-9/]*)/*([0-9.]*)') re (match);
The regular expression does the following:
([^0-9/]*)
Capture everything up to the first digit or forward slash. Note that this will also match SDK names that have characters other than A-Za-z
./*
Jump over a forward slash, if present([0-9.]*)
Capture any following digits or dots. If you are confident that only digits and dots will follow then you can also do (*)
.Note also that I put the DISTINCT
clause in a separate subquery. It is not very efficient to first process every row to then throw out any duplicates. Instead, get rid of the duplicates first.
Pre PG-10 versions
The function regexp_match()
was introduced in version 10. If you have an older version then you can use regexp_matches()
without the g
flag for the same result (PG8.3+).
Upvotes: 1
Reputation: 1623
You can use translate for this:
SELECT sdk, translate(sdk,'0123456789/.','') AS sdk_name,
translate(lower(sdk),'abcdefghijklmnopqrstuvwxyz/','') AS sdk_version
FROM table1;
Working fiddle
EDIT (by Gordon):
This is a good idea. I find it simpler to use regexp_replace()
:
select regexp_replace(sdk, '[0-9/.]', '', 'g') as sdk_name,
regexp_replace(sdk, '[a-zA-Z/]', '', 'g') as sdk_version
Upvotes: 3