Chi
Chi

Reputation: 197

find and plus insufficient datetime in dataframe python

I have a dataframe as show below:

df = 
index         column1  column2  column3  column4
2014-5-21         2      3.4      4.3         3
2014-5-22        34        5        2       666
...
2014-12-31        9      4.3      4.3         1

and I would like to create a function like below:

def fullyear(df)

when I create a dataframe with insufficient datatime, it would like it to return a new dataframe like this:

index         column1  column2  column3  column4

2014-1-1        NaN      NaN     NaN        NaN
2014-1-2        NaN      NaN     NaN        NaN
...
2014-5-21         2      3.4      4.3         3
2014-5-22        34        5        2       666
...
2014-12-31        9      4.3      4.3         1

the missing date would be automatically filled and the data in columns would be arragned by NaN

And the date in dataframe is random so I still don't have a good idea how to solve this. Anyone has an idea to solve this? Thanks in advance!

Upvotes: 1

Views: 37

Answers (1)

jezrael
jezrael

Reputation: 862711

Use reindex by date_range:

idx = pd.date_range('2014-01-01', '2014-12-31')
df.index = pd.to_datetime(df.index)
df = df.reindex(idx)

For more dynamic solution is possible generate min and max year:

df.index = pd.to_datetime(df.index)
y = df.index.year
idx = pd.date_range('{}-01-01'.format(y.min()), '{}-12-31'.format(y.max()))
df = df.reindex(idx)

print (df.tail())
            column1  column2  column3  column4
2014-12-27      NaN      NaN      NaN      NaN
2014-12-28      NaN      NaN      NaN      NaN
2014-12-29      NaN      NaN      NaN      NaN
2014-12-30      NaN      NaN      NaN      NaN
2014-12-31      9.0      4.3      4.3      1.0

And last wrap it to function:

def fullyear(df):
    df.index = pd.to_datetime(df.index)
    y = df.index.year
    idx = pd.date_range('{}-01-01'.format(y.min()), '{}-12-31'.format(y.max()))
    return df.reindex(idx)

df1 = fullyear(df)

Upvotes: 1

Related Questions