Graham Streich
Graham Streich

Reputation: 924

adding dates to a pandas data frame

I currently have a df in pandas with a variable called 'Dates' that records the data an complaint was filed.

data = pd.read_csv("filename.csv") Dates Initially Received 07-MAR-08 08-APR-08 19-MAY-08

As you can see there are missing dates between when complaints are filed, also multiple complaints may have been filed on the same day. Is there a way to fill in the missing days while keeping complaints that were filed on the same day the same?

I tried creating a new df with datetime and merging the dataframes together,

days = pd.date_range(start='01-JAN-2008', end='31-DEC-2017')
df = pd.DataFrame(data=days)
df.index = range(3653)
dates = pd.merge(days, data['Dates'], how='inner')

but I get the following error:

ValueError: can not merge DataFrame with instance of type <class 
'pandas.tseries.index.DatetimeIndex'>

Here are the first four rows of data

data

Upvotes: 0

Views: 5700

Answers (2)

Yuca
Yuca

Reputation: 6091

You were close, there's an issue with your input

First do:

df = pd.read_csv('filename.csv', skiprows = 1)

Then

days = pd.date_range(start='01-JAN-2008', end='31-DEC-2017')
df_clean = df.reset_index()
df_clean['idx dates'] = pd.to_datetime(df_clean['Initially Received'])
df2 = pd.DataFrame(data=days, index = range(3653), columns=['full dates'])
dates = pd.merge(df2, df_clean, left_on='full dates', right_on = 'idx dates', how='left')

Upvotes: 2

jschnurr
jschnurr

Reputation: 1191

Create your date range, and use merge to outer join it to the original dataframe, preserving duplicates.

import pandas as pd
from io import StringIO

TESTDATA = StringIO(
"""Dates;fruit
05-APR-08;apple
08-APR-08;banana
08-APR-08;pear
11-APR-08;grapefruit
""")

df = pd.read_csv(TESTDATA, sep=';', parse_dates=['Dates'])

dates = pd.date_range(start='04-APR-2008', end='12-APR-2008').to_frame()
pd.merge(
    df, dates, left_on='Dates', right_on=0,
    how='outer').sort_values(by=['Dates']).drop(columns=0)

#   Dates       fruit
#   2008-04-04  NaN
#   2008-04-05  apple
#   2008-04-06  NaN
#   2008-04-07  NaN
#   2008-04-08  banana
#   2008-04-08  pear
#   2008-04-09  NaN
#   2008-04-10  NaN
#   2008-04-11  grapefruit
#   2008-04-12  NaN

Upvotes: 0

Related Questions