jenny
jenny

Reputation: 185

Posix regex in Postgresql to extract from quoted text?

I'm trying to extract a number of strings from column values in PostgreSQL.

So basically I have text in columns in a table that look something like:

blah blah blah.... <something="AValueIWant">....dfdf.gd d.fg d.fd... <something="AnotherValueIWant">.

I want to be able to run a query to extract "AValueIWant" and "AnotherValueIWant" (without the quotes) in a form like this:

regex_matches
-------------
AValueIWant
AnotherValueIWant

Let's call the column "body" and the table "tablebody".

So far I have something like:

select regexp_matches(body, <some kind of pattern that doesn't work>, 'g') from tablebody

I don't feel there is any value posting up what I've tried since they all don't work! I've looked at the docos and tried things like 'something(.+)\"' but that returned the whole body text. Other patterns I've found online like '/"([^"]*/"' return nothing at all. Does anyone else have a better grasp of regex that knows the answer, or another way to do this :)? I'm pretty newbie.

Thanks!

Upvotes: 0

Views: 1459

Answers (2)

Clodoaldo Neto
Clodoaldo Neto

Reputation: 125544

select a[1]
from (
    select regexp_matches(body, '="(.+?)"', 'g') a
    from tablebody
) s

But I would rather use a specialized HTML parser unless it is a very quick and dirt job you want.

Upvotes: 2

zakinster
zakinster

Reputation: 10698

You can try this :

<something="([^"]+)">

Or more generic :

<[A-Za-z]+="([^"]+)">

Or even more generic, if you don't care about the name of the attribute:

="([^"]+)"

Or...well you got the idea.

Upvotes: 0

Related Questions