Reputation: 37
I have these entries:
id | fooddescription
--------------------
1 | 'Mollusks, oyster, eastern (blue point), wild, raw'
2 | 'Mollusks, oyster, eastern (blue point), wild, boiled or steamed'
3 | 'Vegetable oil, olive'
4 | 'Vegetable oil, almond'
5 | 'Pumpkin, boiled, drained, with salt'
6 | 'Pumpkin leaves, boiled, drained, with salt'
I want to treat the first two entries as one because they only have different preparation method, and distinct the others. The words in the string are ordered from general to specific, and the last part (when it has many descriptions and ,
) is usually the preparation method that doesn't need distinction.
Desired result:
id | fooddescription
--------------------
1 | 'Mollusks, oyster, eastern (blue point), wild, '
3 | 'Vegetable oil, olive'
4 | 'Vegetable oil, almond'
5 | 'Pumpkin, boiled, drained, '
6 | 'Pumpkin leaves, boiled, drained, '
First I thought I could trim the string to remove the part after the last comma. So according to this MySQL answer, I made a postgres script:
SELECT reverse(
substring(reverse(fooddescription),
position(',' in reverse(fooddescription)))) as trimmed, count(*)
FROM food_name
GROUP BY trimmed HAVING COUNT(*)>0
I'll get this result:
'Mollusks, oyster, eastern (blue point), wild,'
'Vegetable oil,'
'Pumpkin, boiled, drained,'
'Pumpkin leaves, boiled, drained,'
"Vegetable oil," is not desirable and I couldn't keep the id
.
So my question is:
,
and only trim the last part if there are multiple delimiters?id
for each group after GROUP BY
?Upvotes: 1
Views: 1517
Reputation: 28313
instead of position based substring, you could split the text to array & count the number of elements.
here's a complete example:
WITH food_name (fooddescription) AS (
VALUES
('Mollusks, oyster, eastern (blue point), wild, raw'),
('Mollusks, oyster, eastern (blue point), wild, boiled or steamed'),
('Vegetable oil, olive'),
('Vegetable oil, almond'),
('Pumpkin, boiled, drained, with salt'),
('Pumpkin leaves, boiled, drained, with salt')
)
SELECT ARRAY_TO_STRING(trimmed.trimmed, ', ')
FROM food_name
, LATERAL (SELECT STRING_TO_ARRAY(fooddescription, ', ') parts) parts
, LATERAL (SELECT CASE WHEN array_length(parts, 1) <= 2 THEN parts ELSE parts[1:array_length(parts, 1)-1] END trimmed) trimmed
This returns the following resultset:
trimmed
Mollusks, oyster, eastern (blue point), wild
Mollusks, oyster, eastern (blue point), wild
Vegetable oil, olive
Vegetable oil, almond
Pumpkin, boiled, drained
Pumpkin leaves, boiled, drained
Upvotes: 2
Reputation:
Replace everything after and including the last comma using regexp_replace
:
select regexp_replace(
'Mollusks, oyster, eastern (blue point), wild, raw',
',[^,]*$', ''
);
select regexp_replace(
'Mollusks, oyster, eastern (blue point), wild, boiled or steamed',
',[^,]*$', ''
);
Output for both:
+----------------------------------------------+
| regexp_replace |
|----------------------------------------------|
| Mollusks, oyster, eastern (blue point), wild |
+----------------------------------------------+
Upvotes: 0