Reputation: 35
I have a column in a table that contains data about any updates related to changes about a company in the below format -
#=============#==============#================#
| Company ID | updated_at | updates |
#=============#==============#================#
| 101 | 2020-11-01 | name: |
| | | -ABC |
| | | -XYZ |
| | | url: |
| | | -www.abc.com |
| | | -www.xyz.com |
+-------------+--------------+----------------+
| 109 | 2020-10-20 | rating: |
| | | -4.5 |
| | | -4.0 |
+-------------+--------------+----------------+
As you can see above, the column updates
contains strings that include newlines and describe one or multiple updates. In the above example this means that for company ID 101, the name changed from ABC to XYZ and the url changed from www.abc.com to www.xyz.com. For company ID 109, only the rating changed from 4.5 to 4.0.
However I would like to divide the updates column into 3 columns - one should contain what was changed (url, name etc.), second should have the old value and the 3rd column should have the new value. Something like this -
#============#============#==============#================#
| Company ID | Field | Old Value | New Value |
#============#============#==============#================#
| 101 | name | ABC | XYZ |
+------------+------------+--------------+----------------+
| 101 | url | www.abc.com | www.xyz.com |
+------------+------------+--------------+----------------+
| 109 | rating | 4.5 | 4.0 |
+------------+------------+--------------+----------------+
I am doing this in Postgres and know how to extract substrings based on characters but this looks a bit complicated to me since I need to extract multiple substrings from the same column for each row. Any help would be appreciated. Thanks!
Upvotes: 1
Views: 747
Reputation: 5311
At first, you can use regexp_split_into_table
and a regex with a positive lookahead to get a version of your table in which each of the rows contains exactly one update:
select companyID,
updated_at,
regexp_split_to_table(updates, '\n(?=\y.+:)') as updates
from old;
This will split the column updates
at any newline (\n
) that is followed by a single word and a colon (\y.+:
).
#=============#==============#================#
| companyID | updated_at | updates |
#=============#==============#================#
| 101 | 2020-11-01 | name: |
| | | -ABC |
| | | -XYZ |
+-------------+--------------+----------------+
| 101 | 2020-11-01 | url: |
| | | -www.abc.com |
| | | -www.xyz.com |
+-------------+--------------+----------------+
| 109 | 2020-10-20 | rating: |
| | | -4.5 |
| | | -4.0 |
+-------------+--------------+----------------+
From this, you can more easily build your desired table. To do this, you can use e.g. split_part
to split the update string into the three parts you want.
Putting this together with the first part gets you the full query:
select companyID,
updated_at,
split_part(updates, E':', 1) as field,
split_part(updates, E'\n-', 2) as old_value,
split_part(updates, E'\n-', 3) as new_value
from (select companyID,
updated_at,
regexp_split_to_table(updates, '\n(?=\y.+:)') as updates
from old
)
;
Here is a db<>fiddle example.
More details / additional info:
Upvotes: 2