user3222101
user3222101

Reputation: 1330

How to join 2 dataframe based on some logic

I have a dataframe with the below biweekly data

date            value
15-06-2012      20
30-06-2012      30

And I need to join with another dataframe that has below data:

   date            cost
    2-05-2011       5
    3-04-2012       80
    2-06-2012       10
    3-06-2012       10
    4-06-2012       30
    5-06-2012       20
    10-06-2012      10
    15-06-2012      10
    18-06-2012      30
    20-06-2012      20
    21-06-2012      30
    22-06-2012      30
    29-06-2012      20
    29-10-2012      30

I need to join 2 dataframes in such a way that from another dataframe, i get average cost between 1-15 th june 2012 to fill 15-06-2012 cost and similarly for 30-06-2012 cost, I get avg value between 16-06-2012 to 30-06-2012 and get the below results

 date            value cost
15-06-2012      20     15  which is (10+10+30+20+10+10)/6
30-06-2012      30     26  which is (30+20+30+30+20)/5

Upvotes: 3

Views: 157

Answers (2)

BENY
BENY

Reputation: 323226

Change to datetime of your columns date , then we using merge_asof

#df.date=pd.to_datetime(df.date,dayfirst=True)
#df1.date=pd.to_datetime(df1.date,dayfirst=True)
df['keepkey']=df.date
mergedf=pd.merge_asof(df1,df,on='date',direction ='forward')
mergedf.groupby('keepkey',as_index=False).mean()
Out[373]: 
     keepkey  cost  value
0 2012-06-15    15     20
1 2012-06-30    26     30

Update :

df['keepkey']=df.date
df['key']=df.date.dt.strftime('%Y-%m')
df1['key']=df1.date.dt.strftime('%Y-%m')
mergedf=pd.merge_asof(df1,df,on='date',by='key',direction ='forward')
mergedf.groupby('keepkey',as_index=False).mean()
Out[417]: 
     keepkey  cost  key  value
0 2012-06-15    15    6   20.0
1 2012-06-30    26    6   30.0

Upvotes: 6

cs95
cs95

Reputation: 402363

This would need a merge followed by a groupby:

m = df.merge(df2, on='date', how='outer')
m['date'] = pd.to_datetime(m.date, dayfirst=True)
m = m.sort_values('date')

(m.groupby(m['value'].notnull().shift().fillna(False).cumsum(), 
           as_index=False)
  .agg({'date' : 'last', 'cost' : 'mean', 'value' : 'last'}))

        date  cost  value
0 2012-06-15  15.0   20.0
1 2012-06-30  26.0   30.0

Upvotes: 3

Related Questions