Reputation: 301
Problem:
I have two tables in Snowflake: a dimension
table with account information and a fact
table where the account numbers are embedded as part of a string in a PARAM_VALUE column of type string. These strings can have various formats and delimiters, or no separators at all, which makes it challenging to extract account numbers for joining.
I want to extract and flatten the account numbers from the PARAM_VALUE column in the fact table, and then join them with the dimension table to display corresponding account details. However, I don’t want to rely on any specific separator or formatting in PARAM_VALUE.
The dimension table contains account details:
Dimension Table (account_dim)
Acc_no |
---|
8G4F2H1J9K |
4R6T8E3A1L |
9W5E2C7R4T |
1Q8N6F4G3H |
6M9R5E2C1A |
3K9P7R4G6T |
5E1C8R6T2H |
2T4R8G6F1N |
7H3G5E2C9R |
1L6T8R4E3A |
9R5E2C7K3G |
4F1N6H8R2T |
8G6T4R3E1C |
2H5E1C9R7K |
6T4R3G6F1N |
1C8R6T2H4G |
3R5E2C9K7H |
5T4R6G1F2E |
9K3G5E2C1R |
7R4G6T2H1L |
Apex Industries |
Nova Tech Solutions |
Global Finance Corp |
Eastern Logistics |
Western Digital Inc |
Cybernetic Systems |
Green Earth Energy |
Atlantic Healthcare |
Pacific Software |
Northern Lights Media |
Southern Comfort Foods |
Mountain View Ventures |
Valley Forge Investments |
Oceanic Exploration |
Sunset Properties |
Cityview Financial |
Riverside Manufacturing |
Skyline Advertising |
Brookside Realty |
Lakeside Construction |
The fact table has billions of rows and looks like this:
Fact Table (fact_table)
SQL_ID | OBJ_NAME | Param_Value |
---|---|---|
sql_id890890830492hjkhfdjkd899058390245 | customer_table ('TN190423','TN80989073') ,8G4F2H1J9K &niveiui | |
sql_id234567890123456789012345678901 | orders_table 788686786767%9W5E2C7R4T | jhjhjgjhg |
sql_id987654321098765432109876543210 | products_table ('5T4R6G1F2E','4R6T8E3A1L') AC9879879 | |
sql_id456789012345678901234567890123 | shipping_table (' Apex Industries | |
sql_id111111111111111111111111111111 | inventory_table ' 3K9P7R4G6T,4F1N6H8R2T' TN0008989 | |
sql_id222222222222222222222222222222 | shipping_table ('9W5E2C7R4T','1Q8N6F4G3H') Eastern Logistics, Riverside Manufacturing | jkjkhkj |
sql_id333333333333333333333333333333 | returns_table '6M9R5E2C1A,3K9P7R4G6T' ' 4R6T8E3A1L,' | |
sql_id444444444444444444444444444444 | customers_table ('5E1C8R6T2H','2T4R8G6F1N') [ 6M9R5E2C1A , 2H5E1C9R7K ] | |
sql_id555555555555555555555555555555 | orders_table '7H3G5E2C9R,1L6T8R4E3A' | |
sql_id666666666666666666666666666666 | products_table ('3K9P7R4G6T','7897') | |
sql_id777777777777777777777777777777 | inventory_table '9R5E2C7K3G,4F1N6H8R2T' &&&8G6T4R3E1C AC8098908,AC465465 | |
sql_id678901234567890123456789012345 | customers_table Cityview Financial;; Oceanic Exploration | |
sql_id789012345678901234567890123456 | orders_table TN00098F1NBrookside Realty' | |
sql_id890123456789012345678901234567 | products_table ('6uyiyiuuitug',' 6T4R3G6F1N ') Apex Industries |
Expected Output: I need to extract patterns from PARAM_VALUE column in the fact_table having loolup at dim table account_dim and flatten them across fact table rowsand producing a new table like this:
SQL_ID | OBJ_NAME | searched pattern |
---|---|---|
sql_id890890830492hjkhfdjkd899058390245 | customer_table | 8G4F2H1J9K |
sql_id234567890123456789012345678901 | orders_table | 9W5E2C7R4T |
sql_id987654321098765432109876543210 | products_table | 5T4R6G1F2E |
sql_id987654321098765432109876543210 | products_table | 4R6T8E3A1L |
sql_id456789012345678901234567890123 | shipping_table | Apex Industries |
sql_id456789012345678901234567890123 | shipping_table | 9K3G5E2C1R |
sql_id111111111111111111111111111111 | inventory_table | 4F1N6H8R2T |
sql_id111111111111111111111111111111 | inventory_table | 3K9P7R4G6T |
sql_id222222222222222222222222222222 | shipping_table | 9W5E2C7R4T |
sql_id222222222222222222222222222222 | shipping_table | 1Q8N6F4G3H |
sql_id222222222222222222222222222222 | shipping_table | Eastern Logistics |
sql_id222222222222222222222222222222 | shipping_table | Riverside Manufacturing |
sql_id333333333333333333333333333333 | returns_table | 6M9R5E2C1A |
sql_id333333333333333333333333333333 | returns_table | 3K9P7R4G6T |
sql_id333333333333333333333333333333 | returns_table | 4R6T8E3A1L |
sql_id444444444444444444444444444444 | customers_table | 5E1C8R6T2H |
sql_id444444444444444444444444444444 | customers_table | 2T4R8G6F1N |
sql_id444444444444444444444444444444 | customers_table | 6M9R5E2C1A |
sql_id444444444444444444444444444444 | customers_table | 2H5E1C9R7K |
sql_id555555555555555555555555555555 | orders_table | 7H3G5E2C9R |
sql_id555555555555555555555555555555 | orders_table | 1L6T8R4E3A |
sql_id666666666666666666666666666666 | products_table | 3K9P7R4G6T |
sql_id777777777777777777777777777777 | inventory_table | 9R5E2C7K3G |
sql_id777777777777777777777777777777 | inventory_table | 4F1N6H8R2T |
sql_id777777777777777777777777777777 | inventory_table | 8G6T4R3E1C |
sql_id678901234567890123456789012345 | customers_table | Cityview Financial |
sql_id678901234567890123456789012345 | customers_table | Oceanic Exploration |
sql_id789012345678901234567890123456 | orders_table | Brookside Realty |
sql_id890123456789012345678901234567 | products_table | Apex Industries |
sql_id890123456789012345678901234567 | products_table | 6T4R3G6F1N |
Account can be of varying length and can have both account number and account name and can adapt any format. So there should not be split done as it can be seperated or not seperated and can use any seperator in between. SO focus is on searach from dim table.
I tried using the following REGEXP_LIKE query to match and extract the account numbers. However, this query does not return any results. The irregular formatting of the param_value column seems to be causing issues, and REGEXP_LIKE is not finding matches reliably.
WITH flattened AS (
SELECT
f.sql_id,
f.obj_name,
a.Acc_No,
a.Acc_name
FROM
fact_table f
JOIN
account_dim a ON REGEXP_LIKE(UPPER(f.param_value), UPPER(a.Acc_No))
)
SELECT
sql_id,
obj_name,
Acc_No,
Acc_name
FROM
flattened;
Upvotes: 0
Views: 27