Reputation: 223
I have a data frame which looks as following
id start_date end_date
3001 1-1-2000 5-1-2000
3849 5-1-2001 8-1-2001
8927 6-1-2006 9-1-2006
what i want is a new data frame that as index has id and as column date column which is a date incremented by month from start_date to end_date.
index date
3001 1/1/2000
3001 2/1/2000
3001 3/1/2000
3001 4/1/2000
3001 5/1/2000
3849 5/1/2001
3849 6/1/2001
3849 7/1/2001
3849 7/1/2001
8927 6/1/2006
8927 7/1/2006
8927 8/1/2006
8927 9/1/2006
Upvotes: 0
Views: 2695
Reputation: 2459
recreate your dataframe,
In [39]: df = pd.DataFrame( {"id":[ 3001,3849, 8927] , "start_date": ['1-1-2000','1-5-2001','1-6-2006'], "end_date":['1-5-2000','1-8-2001','1-9-2006']})
set index
In [40]: df = df.set_index('id')
iterate through rows.
In [41]: newdf = pd.DataFrame()
In [42]: for id, row in df.iterrows():
newdf = pd.concat( [newdf, pd.DataFrame( {"id":id, "date": pd.date_range( start=row.start_date, end=row.end_date, freq='D')}) ], ignore_index=True)
print id
....:
3001
3849
8927
In [43]: newdf = newdf.set_index('id')
In [44]: newdf
Out[44]:
date
id
3001 2000-01-01
3001 2000-01-02
3001 2000-01-03
3001 2000-01-04
3001 2000-01-05
3849 2001-01-05
3849 2001-01-06
3849 2001-01-07
3849 2001-01-08
8927 2006-01-06
8927 2006-01-07
8927 2006-01-08
8927 2006-01-09
and done.
it is unclear to me your date format, is it day-first? or month-first? you can check it out here: Specifying date format when converting with pandas.to_datetime
sure, edit for additional answer:
In [32]: b = newdf.reset_index().groupby( 'id').date.transform(
lambda ii : ii.max())
In [33]: b
Out[33]:
0 2000-01-05
1 2000-01-05
2 2000-01-05
3 2000-01-05
4 2000-01-05
5 2001-01-08
6 2001-01-08
7 2001-01-08
8 2001-01-08
9 2006-01-09
10 2006-01-09
11 2006-01-09
12 2006-01-09
Name: date, dtype: datetime64[ns]
In [37]: newdf['new_col'] = (newdf.date == b).astype(int)
In [38]: newdf
Out[38]:
date new_col
id
3001 2000-01-01 0
3001 2000-01-02 0
3001 2000-01-03 0
3001 2000-01-04 0
3001 2000-01-05 1
3849 2001-01-05 0
3849 2001-01-06 0
3849 2001-01-07 0
3849 2001-01-08 1
8927 2006-01-06 0
8927 2006-01-07 0
8927 2006-01-08 0
8927 2006-01-09 1
somehow I cannot just do:
newdf['new_col'] = newdf.reset_index().groupby('id').date.transform( lambda ii: ii == ii.max())
....don't know why .
Upvotes: 1