Reputation: 65
I have read data from a csv file containing many duplicate email addresses into a temp table. The format is essentially id, emailtype-description, email.
Here is an example of some data:
id emailtype-description email
1 E-Mail [email protected]
1 preferred E-mail [email protected]
2 2nd E-mail [email protected]
2 preferred-Email [email protected]
2 family E-Mail [email protected]
cInputFile = SUBSTITUTE(cDataDirectory, "Emails").
INPUT STREAM csv FROM VALUE(cInputFile).
IMPORT STREAM csv DELIMITER "," ^ NO-ERROR.
REPEAT TRANSACTION:
CREATE ttEmail.
IMPORT STREAM csv DELIMITER ","
ttEmail.uniqueid
ttEmail.emailTypeDescription
ttEmail.emailAddr
.
END.
INPUT STREAM csv CLOSE.
I want to dedupe the rows, but I don't want to do this randomly. I want to make sure that certain types take priority over others. For instance, some are marked with the type "preferred E-mail" and those should always remain if they exist, additional types take precedent over others, so "E-mail" will take precedent over "2nd-Email" or "family E-Mail".
I'd like to do in Progress code the equivalent of a custom sort of emailtype-description, then a de-dupe. That way I could define the sort order and then dedupe to retain the emails and the types by priority.
Is there a way to do this to my table in Progress? I want to sort first by uniqueid, then by emailtype-description, but I want a custom sort, not an alphabetical sort. What is the best approach?
Upvotes: 0
Views: 81
Reputation: 14020
When you say that you want a custom sort, not alphabetical do you mean that you want to sort by the emailtype in a non-alphabetical way? If so then I think that you would need to translate the email type into a field that sorts the way that you wish. Something along these lines:
/* first add a field to your ttEmail called emailTypeSortOrder */
define variable emailTypeSortOrderList as character no-undo.
emailTypeSortOrderList = "preferred E-mail,E-mail,2nd-Email,family E-mail".
cInputFile = SUBSTITUTE(cDataDirectory, "Emails").
INPUT STREAM csv FROM VALUE(cInputFile).
IMPORT STREAM csv DELIMITER "," ^ NO-ERROR.
REPEAT TRANSACTION:
CREATE ttEmail.
IMPORT STREAM csv DELIMITER ","
ttEmail.uniqueid
ttEmail.emailTypeDescription
ttEmail.emailAddr
.
/* classify the email type sort order
*/
ttEmail.emailTypeSortOrder = lookup( emailTypeDescription, emailTypeSortOrderList ).
if ttEmail.emailTypeSortOrder <= 0 then emailTypeSortOrder = 9999999.
END.
INPUT STREAM csv CLOSE.
And now you can sort and de-duplicate using the newly ordered field:
for each ttEmail break by ttEmail.emailAddr by ttEmail.emailTypeSortOrder:
if first-of( ttEmail.emailAddr ) then
next. /* always keep the first one */
else
delete ttEmail. /* remove unwanted duplicates... */
end.
Upvotes: 3