Reputation: 73
Context
Im having a table with Customer information. I want to find out the repeat customers in the table based on information like:
Now to compare one customer with the rest of the records in the same table, I need to:
Question
Is there a way to make the Table_Input step read or output one record at a time but it should read the next record automatically after the processing of the previous record is complete? This process should continue till all the records in the table are checked/ processed.
Also, would like to know if we can Iterate the same procedure instead of reading one record at a time from Table_Input?
Upvotes: 1
Views: 726
Reputation: 3968
Is there a way to make the Table_Input step read or output one record at a time but it should read the next record automatically after the processing of the previous record is complete?
Yes it is possible to change the buffer rows in between the steps. You can change the Nr of Rows in rowset to 1. But it is not recommended to change this property unless you run low on memory. This might make the tool behave abnormally.
Now as per the comments shared, i see there are two questions:
1. You need to check the count of duplicate entries:
You can achieve this result either using a Group By step or using the Unique step as answered by astro11. You can get the count of names easily and if the count is greater than 1, you can consider it as duplicate.
2. Checking on the two data rows:
You want to validate two names (for e.g.) like "John S" and "John Smith". Both are names should ideally be considered as a single name, hence a duplicate.
First of all this is a data quality issue and no tool will consider these rows as same. What you can do is to use a step called "Fuzzy match". This step based on the algorithms you choose will try to give you the measure of the closest match of Names. But for achieving this you need to have a seperate MASTER table with all the possible names. You can use "Jaro Winkler" algo to get the closest match.
Hope this helps :)
Upvotes: 1
Reputation: 742
To make your Table Input read and write row by row, doesn't see like the best solution and I don't think it would achieve what you want (e.g. keeping a track of previous records).
You could try using the Unique rows step, that can redirect a duplicate row (using the key you want) to another flow where it will be treated differently (or delete it if you don't want it). From what I can see you'll want to have multiple Unique rows to check each one of the columns.
Upvotes: 2