Chandan Kumar Yadav
Chandan Kumar Yadav

Reputation: 51

Fetch data from second dataframe and append it with first dataframe if conditions are matched

I have two dataframe named emp_crm and verient. If below conditions are matched append emp_crm dataframe values to verient dataframe.

Problem statement:
I want to fetch records from emp_crm and append to verient if emp_crm[ACD] matches Verient[PbxId] AND Date of AudioStart equals to Date of OPEN_DT AND Time of OPEN_DT is less than Time of AudioStart AND greater than Time of AudioEnd
[condition: AudioStartTime < Ticket OpenTime < Audio EndTime]
then for matching emp_crm[ACD] with Verient[PbxId] append the emp_crm's [all columns] to Verient[all columns].

Pointed to be noted:

No records should get deleted from verient and append only for matching records and leave other records blank.

My Approach -

  1. If emp_crm[ACD] == Verient[PbxId] AND verient[AudioStartDate] == emp_crm[TicketOpenDate] AND StartTime_in_Seconds < OpenTime_in_Seconds < EndTime_in_Seconds then
    1.1. for matching emp_crm[ACD] == Verient[PbxId],
    1.1.1. Append the emp_crm's TICKET_NO, Tier_1 to Verient(all columns)
    1.1.2. Drop unnecessary columns to keep verient_crm to simple.
DataFrame: emp_crm

ACD     NID     OPEN_DT             OPEN_ID TICKET_NO       TIER_1  OpenDate    OpenTime    OpenTime_in_Seconds 
6557    n199    2021-05-19 20:13:00 n199    INC000047814    Voice   2021-05-19  20:13:00    72780
7768    n141    2021-04-30 19:39:00 n141    INC000047860    Data    2021-04-30  19:39:00    70740
7768    n141    2021-04-30 15:56:30 n141    INC000047853    Data    2021-04-30  15:56:30    57390
6570    n116    2021-05-19 16:00:27 n116    INC000047848    Ring    2021-05-24  16:10:00    58200
6570    n116    2021-05-19 15:55:02 n116    INC000047784    Voice   2021-05-19  15:55:02    57302


DataFrame: verient

AudioEnd            AudioStart              PbxId   AudioStartDate  StartTime AudioEndDate  EndTime     StartTime_in_Seconds    EndTime_in_Seconds 
2021-04-30 16:07:24 2021-04-30 15:35:27     6576    2021-04-30      15:35:27  2021-04-30    16:07:24    56127                   58044
2021-04-30 15:51:12 2021-04-30 15:46:30     7768    2021-04-30      15:46:30  2021-04-30    15:51:12    56790                   57072
2021-04-30 16:16:06 2021-04-30 15:48:22     7768    2021-04-30      15:48:22  2021-04-30    16:16:06    56902                   58566
2021-05-19 16:00:47 2021-05-19 15:50:02     6570    2021-05-19      15:50:02  2021-05-19    16:00:47    57002                   57647
2021-05-19 16:02:20 2021-05-19 15:53:26     6570    2021-05-19      15:53:26  2021-05-19    16:02:20    57206                   57740


Final DataFrame: verient_crm

AudioEnd                AudioStart              PbxId   ACD     NID     OPEN_DT                 OPEN_ID     TICKET_NO       TIER_1
2021-04-30 16:07:24     2021-04-30 15:35:27     6576    
2021-04-30 15:51:12     2021-04-30 15:46:30     7768    
2021-04-30 16:16:06     2021-04-30 15:48:22     7768    7768    n141    2021-04-30 15:56:30     n141        INC000047853    Data
2021-05-19 16:00:47     2021-05-19 15:50:02     6570    6570    n116    2021-05-19 16:00:27     n116        INC000047848    Ring
2021-05-19 16:02:20     2021-05-19 15:53:26     6570    6570    n116    2021-05-19 15:55:02     n116        INC000047784    Voice   


Final Result after dropping unnecessary columns:
------------------------------------------------
verient_crm(AudioEnd,AudioStart,PbxId,ACD, NID,OPEN_DT, OPEN_ID, TICKET_NO, TIER_1)


My Code and approach:


import pandas as pd
import numpy as np

matchedDict = {}
for record in emp_crm.iterrows():  
    for vr in verient.iterrows():   
        if(record == vr):   ## verient.pbxLoginID == emp_CRM.ACD
            if(vr.AudioStart_in_TimeStamp <= record.OPEN_DT_in_TimeStamp and record.OPEN_DT_in_TimeStamp <= vr.AudioEnd_in_TimeStamp):
                matchedDict[vr.index] = record  


# Append emp_crm(ACD, NID,OPEN_DT, OPEN_ID, TICKET_NO, TIER_1) with Verient for every matched ACD == PbxID from emp_crm to Verient. 

for item in matchedDict:
    record = item.value
    verient.iloc[item.key,col_index_for_ACD] = record['ACD']  ## Don't know how to fetch col_index_for_tier1
    verient.iloc[item.key,col_index_for_NID] = record['NID']
    verient.iloc[item.key,col_index_for_OPEN_DT] = record['OPEN_DT']
    verient.iloc[item.key,col_index_for_OPEN_ID] = record['OPEN_ID']
    verient.iloc[item.key,col_index_for_TICKET_NO] = record['TICKET_NO']
    verient.iloc[item.key,col_index_for_TIER_1] = record['TIER_1']

I am getting the below error and not able to solve this.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-74-bf0a6bb075ee> in <module>
      4 for record in emp_crm.iterrows():
      5     for vr in verient.iterrows():
----> 6         if(record['ACD'] == vr['pbxLoginID']):   ## verient.pbxLoginID == emp_CRM.ACD or not
      7             if(vr.AudioStart_in_TimeStamp <= record.OPEN_DT_in_TimeStamp and record.OPEN_DT_in_TimeStamp <= vr.AudioEnd_in_TimeStamp):
      8                 matchedDict[vr.index] = record

TypeError: tuple indices must be integers or slices, not str

Pointed to be noted:

  1. No records should get deleted from verient and append only for macthing records and leave other records blank.

Upvotes: 0

Views: 94

Answers (1)

Shradha
Shradha

Reputation: 2452

Instead of iterating over the dataframes (which is a very costly operation), you can merge the dataframes and then filter the resultant dataframe as follows-

# Left merge
verient = pd.merge(verient, emp_crm, left_on = ['PbxId'], right_on = ['ACD'], how='left')

# Filter
verient= verient.loc[(verient.AudioStart_in_TimeStamp < verient.OPEN_DT_in_TimeStamp) & (verient.OPEN_DT_in_TimeStamp < verient.AudioEnd_in_TimeStamp)]

You can then drop unnecessary columns if needed.

Edit - To keep original rows from verient dataframe, you can assign NULL values to emp_crm columns wherever the conditions don't match. Instead of filter, do this-

verient.loc[~((verient.AudioStart_in_TimeStamp < verient.OPEN_DT_in_TimeStamp) & (verient.OPEN_DT_in_TimeStamp < verient.AudioEnd_in_TimeStamp)), ['ACD', 'NID', 'OPEN_DT', 'OPEN_ID', 'TICKET_NO', 'TIER_1']] = np.nan

Upvotes: 1

Related Questions