How to remove duplicate values on google data studio

I have a dimension (a column from google sheets) called products with the following values:

product = [apple , apple_old_2019, pineapple , pineapple_old_2020, pineapple_old_2017 ...]

I need to regex then and remove the pattern old_**** and then aggregate the values by name.

In Google Sheets I would replace the values and then use the Unique formula, but in Google Data Studio there isn't such function.

I created a custom field called Product_pre with this formula:

REGEXP_EXTRACT(Product , '^(.+?)(_old_[0-9]{2}-[0-9]{4})' )

Then I created another custom field with the following formula:

CASE
    WHEN Product_pre_process is null THEN Product
    ELSE Product_pre_process 
END

The problem is that the result has duplicated values:

product_processed = [apple , apple, pineapple , pineapple, pineapple ...]

How could I fix that?

Upvotes: 2

Views: 13868

Answers (1)

Nimantha
Nimantha

Reputation: 6472

1) Extract the First Word
The REGEXP_EXTRACT function below does the trick (extracting all the characters from the beginning of each string till the first instance of _):

REGEXP_EXTRACT(Product , "^([^_]*)")

2) Consolidation
If the chart type is a Table, then removing the rest of the dimensions and leaving just the newly created dimension will result in the metric values automatically aggregating based on the two values in the dimension (apple and pineapple).

Google Data Studio Report as well as a GIF to visualise the above:

Upvotes: 3

Related Questions