Reputation: 241
id start end
1 2001 2005
2 2004 2007
output
id date
1 2001
1 2002
1 2003
1 2004
1 2005
2 2004
2 2005
2 2006
2 2007
my logics
df=pd.concat([pd.DataFrame({'start': pd.date_range(row.start, row.end, freq='AS'),
'id': row.id}, columns=['start', 'id'])
for i, row in df.iterrows()], ignore_index=True)
df1 = (pd.concat([pd.Series(r.id, pd.date_range(r.start, r.end, freq='AS')) for r in df.itertuples()]) .reset_index())
My data frame has minimum 300 000 rows so these are not the efficient solutions. Is there more efficient solution?
note: start and end are annual, monthly,daily....formats. I have given annual example.
Upvotes: 0
Views: 56
Reputation: 323266
Maybe wen can using stack
with groupby
range
df.set_index('id').stack().groupby(level=0).apply(lambda x : pd.Series(list(range(x.iloc[0],x.iloc[1]+1)))).reset_index()
Out[746]:
id level_1 0
0 1 0 2001
1 1 1 2002
2 1 2 2003
3 1 3 2004
4 1 4 2005
5 2 0 2004
6 2 1 2005
7 2 2 2006
8 2 3 2007
Upvotes: 1