user3816493
user3816493

Reputation: 223

Generating new time series index from existing data set in Pandas

I have a data frame which looks as following

id      start_date  end_date 
3001    1-1-2000    5-1-2000
3849    5-1-2001    8-1-2001
8927    6-1-2006    9-1-2006

what i want is a new data frame that as index has id and as column date column which is a date incremented by month from start_date to end_date.

index   date
3001    1/1/2000
3001    2/1/2000
3001    3/1/2000
3001    4/1/2000
3001    5/1/2000
3849    5/1/2001
3849    6/1/2001
3849    7/1/2001
3849    7/1/2001
8927    6/1/2006
8927    7/1/2006
8927    8/1/2006
8927    9/1/2006

Upvotes: 0

Views: 2695

Answers (1)

fast tooth
fast tooth

Reputation: 2459

recreate your dataframe,

In [39]: df = pd.DataFrame( {"id":[ 3001,3849, 8927] , "start_date": ['1-1-2000','1-5-2001','1-6-2006'], "end_date":['1-5-2000','1-8-2001','1-9-2006']})

set index

In [40]: df = df.set_index('id')

iterate through rows.

In [41]: newdf = pd.DataFrame()

In [42]: for id, row in df.iterrows(): 
    newdf = pd.concat( [newdf, pd.DataFrame( {"id":id, "date": pd.date_range( start=row.start_date, end=row.end_date, freq='D')}) ], ignore_index=True)
    print id
   ....:     
3001
3849
8927

In [43]: newdf = newdf.set_index('id')

In [44]: newdf
Out[44]: 
           date
id             
3001 2000-01-01
3001 2000-01-02
3001 2000-01-03
3001 2000-01-04
3001 2000-01-05
3849 2001-01-05
3849 2001-01-06
3849 2001-01-07
3849 2001-01-08
8927 2006-01-06
8927 2006-01-07
8927 2006-01-08
8927 2006-01-09

and done.

it is unclear to me your date format, is it day-first? or month-first? you can check it out here: Specifying date format when converting with pandas.to_datetime


sure, edit for additional answer:

In [32]: b = newdf.reset_index().groupby( 'id').date.transform( 
lambda ii : ii.max())

In [33]: b
Out[33]: 
0    2000-01-05
1    2000-01-05
2    2000-01-05
3    2000-01-05
4    2000-01-05
5    2001-01-08
6    2001-01-08
7    2001-01-08
8    2001-01-08
9    2006-01-09
10   2006-01-09
11   2006-01-09
12   2006-01-09
Name: date, dtype: datetime64[ns]

In [37]: newdf['new_col'] = (newdf.date == b).astype(int) 
In [38]: newdf
Out[38]: 
           date  new_col
id                      
3001 2000-01-01        0
3001 2000-01-02        0
3001 2000-01-03        0
3001 2000-01-04        0
3001 2000-01-05        1
3849 2001-01-05        0
3849 2001-01-06        0
3849 2001-01-07        0
3849 2001-01-08        1
8927 2006-01-06        0
8927 2006-01-07        0
8927 2006-01-08        0
8927 2006-01-09        1

somehow I cannot just do:

newdf['new_col'] = newdf.reset_index().groupby('id').date.transform( lambda ii: ii == ii.max())

....don't know why .

Upvotes: 1

Related Questions