OAK
OAK

Reputation: 3166

Dates fill in for date range and fillna

I am querying my database to show records from the past week. I am then aggregating the data and transposing it in python and pandas into a DataFrame. In this table I am attempting to show what occurred on each day in the past 7 week, however, on some days no events occur. In these cases, the date is missing altogether. I am looking for an approach to append the dates that are not present (but are part of the date range specified in the query) so that I can then fillna with any value I wish for the other missing columns.

In some trials I have the data set into a pandas Dataframe where the dates are the index and in others the dates are a column. I am preferably looking to have the dates as the top index - so group by name, stack purchase and send_back and dates are the 'columns'.

Here is an example of how the dataframe looks now and what I am looking for:

Dates set in query for - 01.08.2016 - 08.08.2016. The dataframe looks liks so:

       |  dates       | name     | purchase | send_back
   0  01.08.2016    Michael     120          0
   1  02.08.2016    Sarah       100         40
   2  04.08.2016    Sarah       55           0
   3  05.08.2016    Michael     80          20
   4  07.08.2016    Sarah       130          0

After:

     | dates       | name     | purchase | send_back
   0 01.08.2016    Michael      120          0
   1 02.08.2016    Sarah        100          40
   2 03.08.2016    -            0            0
   3 04.08.2016    Sarah        55           0
   4 05.08.2016    Michael      80           20
   5 06.08.2016    -            0            0
   6 07.08.2016    Sarah        130          0
   7 08.08.2016    Sarah        0            35
   8 08.08.2016    Michael      20           0 

Printing the following:

 df.index

gives:

 'Index([   u'dates',u'name',u'purchase',u'send_back'],
      dtype='object')

RangeIndex(start=0, stop=1, step=1)'

I appreciate any guidance.

Upvotes: 0

Views: 2222

Answers (2)

Nickil Maveli
Nickil Maveli

Reputation: 29711

Your index is of object type and you must convert it to datetime format.

# Converting the object date to datetime.date
df['dates'] = df['dates'].apply(lambda x: datetime.strptime(x, "%d.%m.%Y"))

# Setting the index column
df.set_index(['dates'], inplace=True)

# Choosing a date range extending from first date to the last date with a set frequency
new_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='D')
new_index.name = df.index.name

# Setting the new index
df = df.reindex(new_index)

# Making the required modifications
df.ix[:,0], df.ix[:,1:] = df.ix[:,0].fillna('-'), df.ix[:,1:].fillna(0)

print (df)

               name  purchase  send_back
dates                                   
2016-08-01  Michael     120.0        0.0
2016-08-02    Sarah     100.0       40.0
2016-08-03        -       0.0        0.0
2016-08-04    Sarah      55.0        0.0
2016-08-05  Michael      80.0       20.0
2016-08-06        -       0.0        0.0
2016-08-07    Sarah     130.0        0.0

Let's suppose you have data for a single day (as mentioned in the comments section) and you would like to fill the other days of the week with null values:

Data Setup:

df = pd.DataFrame({'dates':['01.08.2016'], 'name':['Michael'], 
                   'purchase':[120], 'send_back':[0]})
print (df)

        dates     name  purchase  send_back
0  01.08.2016  Michael       120          0

Operations:

df['dates'] = df['dates'].apply(lambda x: datetime.strptime(x, "%d.%m.%Y"))
df.set_index(['dates'], inplace=True)

# Setting periods as 7 to account for the end of the week
new_index = pd.date_range(start=df.index[0], periods=7, freq='D')
new_index.name = df.index.name

# Setting the new index
df = df.reindex(new_index)
print (df)

               name  purchase  send_back
dates                                   
2016-08-01  Michael     120.0        0.0
2016-08-02      NaN       NaN        NaN
2016-08-03      NaN       NaN        NaN
2016-08-04      NaN       NaN        NaN
2016-08-05      NaN       NaN        NaN
2016-08-06      NaN       NaN        NaN
2016-08-07      NaN       NaN        NaN

Incase you want to fill the null values with 0's, you could do:

df.fillna(0, inplace=True)
print (df)
               name  purchase  send_back
dates                                   
2016-08-01  Michael     120.0        0.0
2016-08-02        0       0.0        0.0
2016-08-03        0       0.0        0.0
2016-08-04        0       0.0        0.0
2016-08-05        0       0.0        0.0
2016-08-06        0       0.0        0.0
2016-08-07        0       0.0        0.0

Upvotes: 1

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210822

assuming you have the following DF:

In [93]: df
Out[93]:
               name  purchase  send_back
dates
2016-08-01  Michael       120          0
2016-08-02    Sarah       100         40
2016-08-04    Sarah        55          0
2016-08-05  Michael        80         20
2016-08-07    Sarah       130          0

you can resample and replace:

In [94]: df.resample('D').first().replace({'name':{np.nan:'-'}}).fillna(0)
Out[94]:
               name  purchase  send_back
dates
2016-08-01  Michael     120.0        0.0
2016-08-02    Sarah     100.0       40.0
2016-08-03        -       0.0        0.0
2016-08-04    Sarah      55.0        0.0
2016-08-05  Michael      80.0       20.0
2016-08-06        -       0.0        0.0
2016-08-07    Sarah     130.0        0.0

Upvotes: 1

Related Questions