user8834780
user8834780

Reputation: 1670

Remove duplicates based on only 1 column

My data is in the following format:

rep_id  user_id  other non-duplicated data
1       1        ...
1       2        ...
2       3        ...
3       4        ...
3       5        ...

I am trying to achieve a column for deduped_rep with 0/1 such that only first rep id across the associated users has a 1 and rest have 0.

Expected result:

rep_id  user_id  deduped_rep
1       1        1
1       2        0
2       3        1
3       4        1
3       5        0

For reference, in Excel, I would use the following formula: IF(SUMPRODUCT(($A$2:$A2=A2)*($A$2:$A2=A2))>1,0,1)

I know there is the FIXED() LoD calculation http://kb.tableau.com/articles/howto/removing-duplicate-data-with-lod-calculations, but I only see use cases of it deduplicating based on another column. However, mine are distinct.

Upvotes: 1

Views: 1943

Answers (2)

Alex Blakemore
Alex Blakemore

Reputation: 11896

Define a field first_reg_date_per_rep_id as

{ fixed rep_id : min(registration_date) }

The define a field is_first_reg_date? as

registration_date = first_reg_date_per_rep_id

You can use that last Boolean field to distinguish the first record for each rep_id from later ones

Upvotes: 2

Ahmad Dwi Santoso
Ahmad Dwi Santoso

Reputation: 81

try this query

select 
    rep_id,
    user_id,
    row_number() over(partition by rep_id order by rep_id,user_id) deduped_rep
from 
    table

Upvotes: 0

Related Questions