shockwave
shockwave

Reputation: 3272

Python - Lookup a value in the dataframe for the same ID

I have the following dataframe as shown below

ID    TYPE    POLICY_NUMBER       DISB_AMT
738   20      FLDINC MSH39990     1
738   21      MSH39990            3848
750   20      INF395737           1
750   21      INF395737 FLDINCL   2350
892   20      SJK389743           3904
892   21      MSH284989           1     

I'm trying to group by the ID and extract the policy number and search in the other TYPE eg:(TYPE =20 or 21) if the policy number is same in both TYPE for an ID then check if DISB_AMT>1 in the two rows. If true then do not append this to the dataframe.

eg: ID 738 has the same policy number MSH39990 in both rows. I wrote a script to extract only numbers so that it is easier to compare. ID 738 has the same policy number. Now we check if the DISB_AMT > 1. In the first row it is not >1. In the second row we have 3848>1. Do not include this ID in the result. For ID 892 since the POLICY NUMBER is not same in both TYPE we check only if DISB_AMT>1 for TYPE 21. Since it is not >1 we add this row to the results dataframe.

How do I compare it with the other type and check if the policy number is the same and build the rest of the logic?

Expected Output

ID    TYPE    POLICY_NUMBER       DISB_AMT
892   21      MSH284989           1

Code

data = [{"ID":738,"TYPE":20,"POLICY_NUMBER":"FLDINC MSH39990","DISB_AMT":1},
        {"ID":738,"TYPE":21,"POLICY_NUMBER":"MSH39990","DISB_AMT":3848},
        {"ID":750,"TYPE":20,"POLICY_NUMBER":"INF395737","DISB_AMT":1},
        {"ID":750,"TYPE":21,"POLICY_NUMBER":"INF395737 FLDINCL","DISB_AMT":2350},
        {"ID":892,"TYPE":20,"POLICY_NUMBER":"SJK389743","DISB_AMT":3904},
        {"ID":892,"TYPE":21,"POLICY_NUMBER":"MSH284989","DISB_AMT":1}
        ]

df=pd.DataFrame(data)

df['CLEANED_POL_NBR']=df.POLICY_NUMBER.str.extract('(\d+)')

Upvotes: 0

Views: 201

Answers (1)

Quang Hoang
Quang Hoang

Reputation: 150745

IIUC:

df[~df.duplicated(['ID','CLEANED_POL_NBR'], keep=False) & df['DISB_AMT'].eq(1)]

Output:

   DISB_AMT   ID POLICY_NUMBER  TYPE CLEANED_POL_NBR
5         1  892     MSH284989    21          284989

Upvotes: 1

Related Questions