Craig Conover
Craig Conover

Reputation: 4738

How to get the position of the first occurrence of a digit in a postgres select

I have a field that has the SDK name and version and it is not standardized so that I can easily extract just the name and the version. These are contrived values but represent the possible values I am working with:

As you can see, the use of "/" is not consistent and I need a way to parse the alpha from the numeric consistently so that I end up with two columns like this:

JavaScript     2.3.4
JavaScript     4.3.1
Android        4.6.5
Android        3.2.1
Swift          4.5.3
Swift          3.1.1.5

I have searched for different ways to do this but nothing I have found seems to do what I required.

Ultimately, I need to put this into a Postgres SELECT statement like this:

EDIT

I think the DISTINCT keyword is unneeded and confusing. It was originally part of a COUNT/GROUP BY query, but for simplicity sake, I just want to list all rows with the three columns: sdk, sdk_name and sdk_version separately. From there I will use the best parsing formula from the answers to do as I require.

SELECT sdk, [parse sdk name formula] as "sdk_name", [parse sdk version formula] as "sdk_version"

Furthermore, I don't have a fixed list of SDKs to provide in the query so I am not sure the with/as/values strategy works for me but something I was not aware of and looks useful. I suppose the with/values can just be another SELECT query, though.

Upvotes: 4

Views: 2582

Answers (3)

klin
klin

Reputation: 121919

Use the regex function substring():

with my_data(sdk) as (
values
    ('JavaScript/2.3.4'),
    ('JavaScript/4.3.1'),
    ('Android4.6.5'),
    ('Android3.2.1'),
    ('Swift4.5.3'),
    ('Swift/3.1.1.5')
)

select 
    substring(sdk from '[^\d/]*') as sdk_name, 
    substring(sdk from '\d.*') as sdk_version
from my_data 

  sdk_name  | sdk_version 
------------+-------------
 JavaScript | 2.3.4
 JavaScript | 4.3.1
 Android    | 4.6.5
 Android    | 3.2.1
 Swift      | 4.5.3
 Swift      | 3.1.1.5
(6 rows)

Update.

You can place your select query in the with part (instead of values):

with my_data(sdk) as (
    <select sdk from ...>
)
select 
    substring(sdk from '[^\d/]*') as sdk_name, 
    substring(sdk from '\d.*') as sdk_version
from my_data 

or in the from clause:

select 
    substring(sdk from '[^\d/]*') as sdk_name, 
    substring(sdk from '\d.*') as sdk_version
from (
    <select sdk from ...>
) my_data

Upvotes: 6

Patrick
Patrick

Reputation: 32384

Regular expression parsing is quite compute-intensive, so rather than using two function calls (as in the other answers), you should combine them into a single call and then extract the desired values from the result:

WITH d(sdk) AS (
  VALUES
    ('JavaScript/2.3.4'),
    ('JavaScript/4.3.1'),
    ('Android4.6.5'),
    ('Android3.2.1'),
    ('Swift4.5.3'),
    ('Swift/3.1.1.5'),
    ('C#/23.1') )
SELECT unq.sdk, re.match[1] AS sdk_name, re.match[2] AS sdk_version
FROM (SELECT DISTINCT sdk FROM d) unq,
     regexp_match(unq.sdk, '([^0-9/]*)/*([0-9.]*)') re (match);

The regular expression does the following:

  1. ([^0-9/]*) Capture everything up to the first digit or forward slash. Note that this will also match SDK names that have characters other than A-Za-z.
  2. /* Jump over a forward slash, if present
  3. ([0-9.]*) Capture any following digits or dots. If you are confident that only digits and dots will follow then you can also do (*).

Note also that I put the DISTINCT clause in a separate subquery. It is not very efficient to first process every row to then throw out any duplicates. Instead, get rid of the duplicates first.

Pre PG-10 versions

The function regexp_match() was introduced in version 10. If you have an older version then you can use regexp_matches() without the g flag for the same result (PG8.3+).

Upvotes: 1

Sookie Singh
Sookie Singh

Reputation: 1623

You can use translate for this:

SELECT sdk, translate(sdk,'0123456789/.','') AS sdk_name, 
       translate(lower(sdk),'abcdefghijklmnopqrstuvwxyz/','') AS sdk_version 
FROM   table1;

Working fiddle

EDIT (by Gordon):

This is a good idea. I find it simpler to use regexp_replace():

select regexp_replace(sdk, '[0-9/.]', '', 'g') as sdk_name,
       regexp_replace(sdk, '[a-zA-Z/]', '', 'g') as sdk_version

Upvotes: 3

Related Questions