smb
smb

Reputation: 673

Python: Transpose data to create a record for each month between two dates

I have a dataset like the below:

Category    Date 1     Date 2 
a         2017-01-01  2017-08-01

What I want to be able to do is to transpose this data so I have a record for each month between the two dates e.g.

Category Date
a        2017-01-01
a        2017-02-01
a        2017-03-01
.....
a        2017-08-01

I need to do this in python and my data is already in a pandas dataframe. Dates are in datestamps format YYYY-MM-DD

Upvotes: 4

Views: 1135

Answers (2)

piRSquared
piRSquared

Reputation: 294348

Use a comprehension with pd.date_range

pd.DataFrame(
    [[c, d] for c, d1, d2 in df.itertuples(index=False)
     for d in pd.date_range(d1, d2, freq='MS')],
    columns=['Category', 'Date']
)

  Category       Date
0        a 2017-01-01
1        a 2017-02-01
2        a 2017-03-01
3        a 2017-04-01
4        a 2017-05-01
5        a 2017-06-01
6        a 2017-07-01
7        a 2017-08-01

If you have more than 3 columns and you want just Category and Date

pd.DataFrame(
    [[c, d] for c, d1, d2, *_ in df.itertuples(index=False)
     for d in pd.date_range(d1, d2, freq='MS')],
    columns=['Category', 'Date']
)

The *_ unpacks the rest of the tuple.


Otherwise, we could trap the entire tuple and just grab the bits we need.

pd.DataFrame(
    [[t[0], d] for t in df.itertuples(index=False)
     for d in pd.date_range(t[1], t[2], freq='MS')],
    columns=['Category', 'Date']
)

Upvotes: 6

BENY
BENY

Reputation: 323306

I Am not sure about the datetime format it is m-d-y or d-m-y ...also you can replace the resample('D') to resample('MS') get the month start.

df.melt('Category').set_index('value').resample('D').first().ffill().drop('variable',1)
Out[31]: 
           Category
value              
2017-01-01        a
2017-01-02        a
2017-01-03        a
2017-01-04        a
2017-01-05        a
2017-01-06        a
2017-01-07        a
2017-01-08        a

Change to MS

df.melt('Category').set_index('value').resample('MS').first().ffill().drop('variable',1)
Out[40]: 
           Category
value              
2017-01-01        a
2017-02-01        a
2017-03-01        a
2017-04-01        a
2017-05-01        a
2017-06-01        a
2017-07-01        a
2017-08-01        a

Upvotes: 3

Related Questions