madsthaks
madsthaks

Reputation: 2181

Not sure how to merge data based on conditions

df1:

tID  sID  dID  date1 date2
1234 4321 5432 7/12  8/13
7890 5688 4567 8/21  9/30

df2:

sID  dID  date3 
4321 5432 7/20
5688 4567 9/15 

I'd like to assign append a tID to df2 if:

df3:

tID  sID  dID  date3 
1234 4321 5432 7/20
7890 5688 4567 9/15 

My first thought was to loop through both dataframes and use if statements but that really doesn't seem like an efficient approach.

Any help would be much appreciated.

Upvotes: 2

Views: 70

Answers (2)

Dennis Golomazov
Dennis Golomazov

Reputation: 17359

df3 = pd.merge(df1, df2, on=['sID', 'dID'])
print df3[df3.date3.between(df3.date1, df3.date2)].drop(['date1', 'date2'], axis=1)

    tID   sID   dID date3
0  1234  4321  5432  7/20
1  7890  5688  4567  9/15

Thanks @RafaelC for the between idea.

Upvotes: 2

rafaelc
rafaelc

Reputation: 59284

IIUC, first set them to the same index (as str, if the IDs might contain ids such as 8/21 before the edit. If all ids are ints, no need to do the astype steps)

df['sID'] = df.sID.astype(str)
df['dID'] = df.dID.astype(str)
df = df.set_index(['sID', 'dID'])


df2['sID'] = df2.sID.astype(str)
df2['dID'] = df2.dID.astype(str)
df2 = df2.set_index(['sID', 'dID'])

Then filter and assign using loc

m = df2.index.isin(df.index)
sub = df.loc[df2[m].date3.index]
s = df2[m].date3.between(sub.date1, sub.date2)
df2.loc[:, 'tID'] = df.loc[s[s].index, 'tID']

Outputs

                date3   tID
sID dID     
4321    5432    7/20    1234
5688    4567    9/15    7890

Upvotes: 2

Related Questions