Reputation: 491
I have 2 data frames with dateranges
>>>>df1 = pd.DataFrame({'ID': ['a', 'a', 'a', 'a', 'b', 'b', 'b'],
'from_dt':[pd.to_datetime('2004-01-01 00:00:00'),
pd.to_datetime('2005-01-01 00:00:00'),
pd.to_datetime('2007-01-01 00:00:00'),
pd.to_datetime('2011-01-01 00:00:00'),
pd.to_datetime('2004-01-01 00:00:00'),
pd.to_datetime('2012-01-01 00:00:00'),
pd.to_datetime('2019-01-01 00:00:00'),
],
'thru_dt':[pd.to_datetime('2004-12-31 23:59:59'),
pd.to_datetime('2006-12-31 23:59:59'),
pd.to_datetime('2010-12-31 23:59:59'),
pd.to_datetime('2075-12-31 23:59:59'),
pd.to_datetime('2011-12-31 23:59:59'),
pd.to_datetime('2018-12-31 23:59:59'),
pd.to_datetime('2075-12-31 23:59:59'),
],
'val':[1,2,3,4,5,6,7]})
>>>>df1
ID from_dt thru_dt val
0 a 2004-01-01 2004-12-31 23:59:59 1
1 a 2005-01-01 2006-12-31 23:59:59 2
2 a 2007-01-01 2010-12-31 23:59:59 3
3 a 2011-01-01 2075-12-31 23:59:59 4
4 b 2004-01-01 2011-12-31 23:59:59 5
5 b 2012-01-01 2018-12-31 23:59:59 6
6 b 2019-01-01 2075-12-31 23:59:59 7
The first one contains a map of ID to Value
>>>>df2 = pd.DataFrame({'ID':['a', 'a', 'b'], 'ID2':['A1', 'A2', 'B1'],
'from_dt':[pd.to_datetime('2003-01-01 00:00:00'),
pd.to_datetime('2010-01-01 00:00:00'),
pd.to_datetime('2005-01-01 00:00:00'),
],
'thru_dt':[pd.to_datetime('2009-12-31 23:59:59'),
pd.to_datetime('2075-12-31 23:59:59'),
pd.to_datetime('2075-12-31 23:59:59'),
]
})
>>>>df2
ID ID2 from_dt thru_dt
0 a A1 2003-01-01 2009-12-31 23:59:59
1 a A2 2010-01-01 2075-12-31 23:59:59
2 b B1 2005-01-01 2075-12-31 23:59:59
The second dataframe has one ID mapped to the other.
I'd like to join them with some inner join logic so that I have a flat file dataframe which has the following output
>>>>df3 = some_function(df1, df2, end_date=pd.to_datetime('2020-12-31'))
>>>>df3
ID2 Date val
0 A1 2004-01-01 1
1 A1 2004-01-02 1
.
.
x A1 2004-12-31 1
x A1 2005-01-01 2
x A1 2005-01-02 2
.
.
x A1 2009-12-31 3
x A2 2010-01-01 3
.
.
x A2 2020-12-31 4
x B1 2005-01-01 5
.
.
x B1 2020-12-31 7
I'm sure I can loop over and do this in some inefficient manner. I'd love to learn about any existing tools and lib that can handle such tasks.
Thanks!
Upvotes: 1
Views: 284
Reputation: 863741
I think you can use list comprehension for flatten date ranges and then filter by end datetime:
s1 = pd.concat([pd.Series(r.Index,pd.date_range(r.from_dt, r.thru_dt))
for r in df1.itertuples()])
df11 = df1.drop(['from_dt','thru_dt'], 1).join(pd.DataFrame({'Date':s1.index}, index=s1))
df11 = df11[df11['Date'].lt('2020-12-31')]
print (df11)
s2 = pd.concat([pd.Series(r.Index,pd.date_range(r.from_dt, r.thru_dt))
for r in df2.itertuples()])
df22 = df2.drop(['from_dt','thru_dt'], 1).join(pd.DataFrame({'Date':s2.index}, index=s2))
df22 = df22[df22['Date'].lt('2020-12-31')]
print (df22)
Upvotes: 2