sais
sais

Reputation: 21

How can I find the mismatch between two similar PROD vs SIT datasets on a summary level and dive deeper into the row for inconsistencies? Alteryx?

I would like to automate this processes for multiple datasets. I am not sure if there exists a tool like Alteryx or any software where I can do the following.

Here is the data that exists in both SIT and PROD:

ID Block Amt A Amt B Amt C ....
1 X 1000 400 1000 ....
2 Y 2000 500 4000 ....
3 Z 3000 500 3000 ....
2 Y 2000 3000 2000 ....
3 Z 3000 4500 3000 ....
Block Counts Sum Amt A Sum Amt B
X 1000 40000 10000
Y 2000 45000 20000
Z 3000 50000 30000
Block Counts Sum Amt A Sum Amt B
X 1000 40000 15000
Y 2000 47000 20000
Z 3000 50000 30000

FYI - There are multiple columns like block that could be used on aggregation. One could choose any column, get an aggregation to find inconsistencies and then explore into that particular block.

Upvotes: 0

Views: 31

Answers (1)

Xue
Xue

Reputation: 55

It's a typical full join, then filter process, you can do it in SQL, Python dataframe or Alteryx, pretty much anything

`
WITH
sum_prod AS
(
 SELECT
  SUM(AMT_A) SUM_AMT_A,
  COUNT(1),..
 FROM prod
 GROUP BY ...
)
,
sum_SIT AS
(
 SELECT
  SUM(AMT_A) SUM_AMT_A,
  COUNT(1),..
 FROM sit
 GROUP BY ...
)

SELECT 
*
FROM sum_prod
FULL JOIN
sum_sit
ON
sum_prod.block = sum_sit.block
WHERE
 sum_prod.Sum_Amt_A <> sum_sit.Sum_Amt_A
 OR sum_prod.Sum_Amt_A IS NULL
 OR sum_sit.Sum_Amt_A IS NULL  -- always take care of one side missing
 --add however_your_condition_to_tell_diff

`

Same principle in Alteryx, just a join and a selection filter will do

Upvotes: 1

Related Questions