Felice
Felice

Reputation: 65

Custom sorting and deduping

I have read data from a csv file containing many duplicate email addresses into a temp table. The format is essentially id, emailtype-description, email.

Here is an example of some data:

id    emailtype-description  email
1     E-Mail                 [email protected]
1     preferred E-mail       [email protected]
2     2nd E-mail             [email protected]
2     preferred-Email        [email protected]
2     family E-Mail          [email protected]
    cInputFile = SUBSTITUTE(cDataDirectory, "Emails").
    
    INPUT STREAM csv FROM VALUE(cInputFile).
    IMPORT STREAM csv DELIMITER "," ^ NO-ERROR.
    
    REPEAT TRANSACTION:
        CREATE ttEmail.
        IMPORT STREAM csv DELIMITER ","
            ttEmail.uniqueid   
            ttEmail.emailTypeDescription 
            ttEmail.emailAddr
            .
    END.
    
    INPUT STREAM csv CLOSE.

I want to dedupe the rows, but I don't want to do this randomly. I want to make sure that certain types take priority over others. For instance, some are marked with the type "preferred E-mail" and those should always remain if they exist, additional types take precedent over others, so "E-mail" will take precedent over "2nd-Email" or "family E-Mail".

I'd like to do in Progress code the equivalent of a custom sort of emailtype-description, then a de-dupe. That way I could define the sort order and then dedupe to retain the emails and the types by priority.

Is there a way to do this to my table in Progress? I want to sort first by uniqueid, then by emailtype-description, but I want a custom sort, not an alphabetical sort. What is the best approach?

Upvotes: 0

Views: 81

Answers (1)

Tom Bascom
Tom Bascom

Reputation: 14020

When you say that you want a custom sort, not alphabetical do you mean that you want to sort by the emailtype in a non-alphabetical way? If so then I think that you would need to translate the email type into a field that sorts the way that you wish. Something along these lines:

    /* first add a field to your ttEmail called emailTypeSortOrder */
    
    define variable emailTypeSortOrderList as character no-undo.
    
    emailTypeSortOrderList = "preferred E-mail,E-mail,2nd-Email,family E-mail".
    
    cInputFile = SUBSTITUTE(cDataDirectory, "Emails").
        
        INPUT STREAM csv FROM VALUE(cInputFile).
        IMPORT STREAM csv DELIMITER "," ^ NO-ERROR.
        
        REPEAT TRANSACTION:
            CREATE ttEmail.
            IMPORT STREAM csv DELIMITER ","
                ttEmail.uniqueid   
                ttEmail.emailTypeDescription 
                ttEmail.emailAddr
                .

            /* classify the email type sort order
             */

            ttEmail.emailTypeSortOrder = lookup( emailTypeDescription, emailTypeSortOrderList ).
            if ttEmail.emailTypeSortOrder <= 0 then emailTypeSortOrder = 9999999.

        END.
        
        INPUT STREAM csv CLOSE.

And now you can sort and de-duplicate using the newly ordered field:

for each ttEmail break by ttEmail.emailAddr by ttEmail.emailTypeSortOrder:
  if first-of( ttEmail.emailAddr ) then
    next. /* always keep the first one */
   else
    delete ttEmail.  /* remove unwanted duplicates... */
end.

Upvotes: 3

Related Questions