Luobster
Luobster

Reputation: 37

Postgresql: How to trim string after last delimiter, only when there are multiple delimiters

I have these entries:

id  |  fooddescription
--------------------
1   |  'Mollusks, oyster, eastern (blue point), wild, raw'
2   |  'Mollusks, oyster, eastern (blue point), wild, boiled or steamed'
3   |  'Vegetable oil, olive'
4   |  'Vegetable oil, almond'
5   |  'Pumpkin, boiled, drained, with salt'
6   |  'Pumpkin leaves, boiled, drained, with salt'

I want to treat the first two entries as one because they only have different preparation method, and distinct the others. The words in the string are ordered from general to specific, and the last part (when it has many descriptions and ,) is usually the preparation method that doesn't need distinction.

Desired result:

id  |  fooddescription
--------------------
1   |  'Mollusks, oyster, eastern (blue point), wild, '
3   |  'Vegetable oil, olive'
4   |  'Vegetable oil, almond'
5   |  'Pumpkin, boiled, drained, '
6   |  'Pumpkin leaves, boiled, drained, '

First I thought I could trim the string to remove the part after the last comma. So according to this MySQL answer, I made a postgres script:

SELECT reverse(
            substring(reverse(fooddescription),
                      position(',' in reverse(fooddescription)))) as trimmed, count(*)
FROM food_name
GROUP BY trimmed HAVING COUNT(*)>0 

I'll get this result:

'Mollusks, oyster, eastern (blue point), wild,'
'Vegetable oil,'
'Pumpkin, boiled, drained,'
'Pumpkin leaves, boiled, drained,'

"Vegetable oil," is not desirable and I couldn't keep the id.

So my question is:

  1. How to judge the number of delimiters , and only trim the last part if there are multiple delimiters?
  2. Also, is it possible to keep one id for each group after GROUP BY?

Upvotes: 1

Views: 1517

Answers (2)

Haleemur Ali
Haleemur Ali

Reputation: 28313

instead of position based substring, you could split the text to array & count the number of elements.

here's a complete example:

WITH food_name (fooddescription) AS (
VALUES
  ('Mollusks, oyster, eastern (blue point), wild, raw'),
  ('Mollusks, oyster, eastern (blue point), wild, boiled or steamed'),
  ('Vegetable oil, olive'),
  ('Vegetable oil, almond'),
  ('Pumpkin, boiled, drained, with salt'), 
  ('Pumpkin leaves, boiled, drained, with salt')
)
SELECT ARRAY_TO_STRING(trimmed.trimmed, ', ')
FROM food_name
, LATERAL (SELECT STRING_TO_ARRAY(fooddescription, ', ') parts) parts
, LATERAL (SELECT CASE WHEN array_length(parts, 1) <= 2 THEN parts ELSE parts[1:array_length(parts, 1)-1] END trimmed) trimmed

This returns the following resultset:

                trimmed
Mollusks, oyster, eastern (blue point), wild
Mollusks, oyster, eastern (blue point), wild
Vegetable oil, olive
Vegetable oil, almond
Pumpkin, boiled, drained
Pumpkin leaves, boiled, drained

Upvotes: 2

user11044402
user11044402

Reputation:

Replace everything after and including the last comma using regexp_replace:

select regexp_replace(
  'Mollusks, oyster, eastern (blue point), wild, raw',
  ',[^,]*$', ''
);
select regexp_replace(
  'Mollusks, oyster, eastern (blue point), wild, boiled or steamed',
  ',[^,]*$', ''
);

Output for both:

+----------------------------------------------+
| regexp_replace                               |
|----------------------------------------------|
| Mollusks, oyster, eastern (blue point), wild |
+----------------------------------------------+

Upvotes: 0

Related Questions