Get each in String - stackexchange database

Question

Mockup code for my problem:

SELECT Id FROM Tags WHERE TagName IN ''

The problem in detail

I am trying to get tags used in 2011 from apple.stackexchange data. (this query)

As you can see, tags in tag changes are stored as plain text in the Text field.

How can I create a unique list of the tags, to look them up in the Tags table, instead of this hardcoded version:

SELECT * FROM Tags
  WHERE TagName = 'osx' 
     OR TagName = 'keyboard' 
     OR TagName = 'security'

Here is a interactive example.

Stackexchange uses T-SQL, my local copy is running under postgresql using Postgres app version 9.4.5.0.

Erwin Brandstetter · Accepted Answer

Assuming this table definition:

CREATE TABLE posthistory(post_id int PRIMARY KEY, tags text);

Depending on what you want exactly:

To convert the string to an array, trim leading and trailing '<>', then treat '><' as separator:

SELECT *, string_to_array(trim(tags, '><'), '><') AS tag_arr
FROM   posthistory;

To get list of unique tags for whole table (I guess you want this):

SELECT DISTINCT tag
FROM   posthistory, unnest(string_to_array(trim(tags, '><'), '><')) tag;

The implicit LATERAL join requires Postgres 9.3 or later.

This should be substantially faster than using regular expressions. If you want to try regexp, use regexp_split_to_table() instead of regexp_split_to_array() followed by unnest() like suggested in another answer:

SELECT DISTINCT tag
FROM   posthistory, regexp_split_to_table(trim(tags, '><'), '><') tag;

Also with implicit LATERAL join. Related:

To search for particular tags:

SELECT *
FROM   posthistory
WHERE  tags LIKE '%%'
AND    tags LIKE '%%';

SQL Fiddle.

Applied to your search in T-SQL in our data explorer:

SELECT TOP 100
       PostId, UserId, Text AS Tags FROM PostHistory
WHERE  year(CreationDate) = 2011
AND    PostHistoryTypeId IN (3  -- initial tags
                           , 6  -- edit tags
                           , 9) -- rollback tags
AND    Text LIKE ('%<' + ##TagName:String?postgresql## + '>%');

(T-SQL syntax uses the non-standard + instead of ||.)
https://data.stackexchange.com/apple/query/edit/417055

Get each <tag> in String - stackexchange database

The problem in detail

Answers (2)

Related Questions

Get each &lt;tag&gt; in String - stackexchange database

The problem in detail

Answers (2)

Related Questions

Get each <tag> in String - stackexchange database