Reputation: 185
I'm trying to extract a number of strings from column values in PostgreSQL.
So basically I have text in columns in a table that look something like:
blah blah blah.... <something="AValueIWant">....dfdf.gd d.fg d.fd... <something="AnotherValueIWant">.
I want to be able to run a query to extract "AValueIWant" and "AnotherValueIWant" (without the quotes) in a form like this:
regex_matches
-------------
AValueIWant
AnotherValueIWant
Let's call the column "body" and the table "tablebody".
So far I have something like:
select regexp_matches(body, <some kind of pattern that doesn't work>, 'g') from tablebody
I don't feel there is any value posting up what I've tried since they all don't work! I've looked at the docos and tried things like 'something(.+)\"' but that returned the whole body text. Other patterns I've found online like '/"([^"]*/"' return nothing at all. Does anyone else have a better grasp of regex that knows the answer, or another way to do this :)? I'm pretty newbie.
Thanks!
Upvotes: 0
Views: 1459
Reputation: 125544
select a[1]
from (
select regexp_matches(body, '="(.+?)"', 'g') a
from tablebody
) s
But I would rather use a specialized HTML parser unless it is a very quick and dirt job you want.
Upvotes: 2
Reputation: 10698
You can try this :
<something="([^"]+)">
Or more generic :
<[A-Za-z]+="([^"]+)">
Or even more generic, if you don't care about the name of the attribute:
="([^"]+)"
Or...well you got the idea.
Upvotes: 0