Reputation: 6846
Mockup code for my problem:
SELECT Id FROM Tags WHERE TagName IN '<osx><keyboard><security><screen-lock>'
I am trying to get tags used in 2011 from apple.stackexchange data. (this query)
As you can see, tags in tag changes are stored as plain text in the Text
field.
<tag1><tag2><tag3>
<osx><keyboard><security><screen-lock>
How can I create a unique list of the tags, to look them up in the Tags
table, instead of this hardcoded version:
SELECT * FROM Tags
WHERE TagName = 'osx'
OR TagName = 'keyboard'
OR TagName = 'security'
Here is a interactive example.
Stackexchange uses T-SQL, my local copy is running under postgresql using Postgres app version 9.4.5.0.
Upvotes: 1
Views: 173
Reputation: 17147
I've simplified the data to the relevant column only and called it tags
to present the example.
Sample data
create table posthistory(tags text);
insert into posthistory values
('<lion><backup><time-machine>'),
('<spotlight><alfred><photo-booth>'),
('<lion><pdf><preview>'),
('<pdf>'),
('<asd>');
Query to get unique list of tags
SELECT DISTINCT
unnest(
regexp_split_to_array(
trim('><' from tags), '><'
)
)
FROM
posthistory
First we're removing all occurences of leading and trailing >
and <
signs from each row, then using regexp_split_to_array()
function to get values into arrays, and then unnest()
to expand an array to a set of rows. Finally DISTINCT
eliminates duplicate values.
Presenting SQLFiddle to preview how it works.
Upvotes: 1
Reputation: 656401
Assuming this table definition:
CREATE TABLE posthistory(post_id int PRIMARY KEY, tags text);
Depending on what you want exactly:
To convert the string to an array, trim leading and trailing '<>', then treat '><' as separator:
SELECT *, string_to_array(trim(tags, '><'), '><') AS tag_arr
FROM posthistory;
To get list of unique tags for whole table (I guess you want this):
SELECT DISTINCT tag
FROM posthistory, unnest(string_to_array(trim(tags, '><'), '><')) tag;
The implicit LATERAL
join requires Postgres 9.3 or later.
This should be substantially faster than using regular expressions. If you want to try regexp, use regexp_split_to_table()
instead of regexp_split_to_array()
followed by unnest()
like suggested in another answer:
SELECT DISTINCT tag
FROM posthistory, regexp_split_to_table(trim(tags, '><'), '><') tag;
Also with implicit LATERAL
join. Related:
To search for particular tags:
SELECT *
FROM posthistory
WHERE tags LIKE '%<security>%'
AND tags LIKE '%<osx>%';
Applied to your search in T-SQL in our data explorer:
SELECT TOP 100
PostId, UserId, Text AS Tags FROM PostHistory
WHERE year(CreationDate) = 2011
AND PostHistoryTypeId IN (3 -- initial tags
, 6 -- edit tags
, 9) -- rollback tags
AND Text LIKE ('%<' + ##TagName:String?postgresql## + '>%');
(T-SQL syntax uses the non-standard +
instead of ||
.)
https://data.stackexchange.com/apple/query/edit/417055
Upvotes: 1