Reputation: 3166
I am querying my database to show records from the past week. I am then aggregating the data and transposing it in python and pandas into a DataFrame. In this table I am attempting to show what occurred on each day in the past 7 week, however, on some days no events occur. In these cases, the date is missing altogether. I am looking for an approach to append the dates that are not present (but are part of the date range specified in the query) so that I can then fillna with any value I wish for the other missing columns.
In some trials I have the data set into a pandas Dataframe where the dates are the index and in others the dates are a column. I am preferably looking to have the dates as the top index - so group by name, stack purchase and send_back and dates are the 'columns'.
Here is an example of how the dataframe looks now and what I am looking for:
Dates set in query for - 01.08.2016 - 08.08.2016. The dataframe looks liks so:
| dates | name | purchase | send_back
0 01.08.2016 Michael 120 0
1 02.08.2016 Sarah 100 40
2 04.08.2016 Sarah 55 0
3 05.08.2016 Michael 80 20
4 07.08.2016 Sarah 130 0
After:
| dates | name | purchase | send_back
0 01.08.2016 Michael 120 0
1 02.08.2016 Sarah 100 40
2 03.08.2016 - 0 0
3 04.08.2016 Sarah 55 0
4 05.08.2016 Michael 80 20
5 06.08.2016 - 0 0
6 07.08.2016 Sarah 130 0
7 08.08.2016 Sarah 0 35
8 08.08.2016 Michael 20 0
Printing the following:
df.index
gives:
'Index([ u'dates',u'name',u'purchase',u'send_back'],
dtype='object')
RangeIndex(start=0, stop=1, step=1)'
I appreciate any guidance.
Upvotes: 0
Views: 2222
Reputation: 29711
Your index is of object
type and you must convert it to datetime
format.
# Converting the object date to datetime.date
df['dates'] = df['dates'].apply(lambda x: datetime.strptime(x, "%d.%m.%Y"))
# Setting the index column
df.set_index(['dates'], inplace=True)
# Choosing a date range extending from first date to the last date with a set frequency
new_index = pd.date_range(start=df.index[0], end=df.index[-1], freq='D')
new_index.name = df.index.name
# Setting the new index
df = df.reindex(new_index)
# Making the required modifications
df.ix[:,0], df.ix[:,1:] = df.ix[:,0].fillna('-'), df.ix[:,1:].fillna(0)
print (df)
name purchase send_back
dates
2016-08-01 Michael 120.0 0.0
2016-08-02 Sarah 100.0 40.0
2016-08-03 - 0.0 0.0
2016-08-04 Sarah 55.0 0.0
2016-08-05 Michael 80.0 20.0
2016-08-06 - 0.0 0.0
2016-08-07 Sarah 130.0 0.0
Let's suppose you have data for a single day (as mentioned in the comments section) and you would like to fill the other days of the week with null values:
Data Setup:
df = pd.DataFrame({'dates':['01.08.2016'], 'name':['Michael'],
'purchase':[120], 'send_back':[0]})
print (df)
dates name purchase send_back
0 01.08.2016 Michael 120 0
Operations:
df['dates'] = df['dates'].apply(lambda x: datetime.strptime(x, "%d.%m.%Y"))
df.set_index(['dates'], inplace=True)
# Setting periods as 7 to account for the end of the week
new_index = pd.date_range(start=df.index[0], periods=7, freq='D')
new_index.name = df.index.name
# Setting the new index
df = df.reindex(new_index)
print (df)
name purchase send_back
dates
2016-08-01 Michael 120.0 0.0
2016-08-02 NaN NaN NaN
2016-08-03 NaN NaN NaN
2016-08-04 NaN NaN NaN
2016-08-05 NaN NaN NaN
2016-08-06 NaN NaN NaN
2016-08-07 NaN NaN NaN
Incase you want to fill the null values with 0's, you could do:
df.fillna(0, inplace=True)
print (df)
name purchase send_back
dates
2016-08-01 Michael 120.0 0.0
2016-08-02 0 0.0 0.0
2016-08-03 0 0.0 0.0
2016-08-04 0 0.0 0.0
2016-08-05 0 0.0 0.0
2016-08-06 0 0.0 0.0
2016-08-07 0 0.0 0.0
Upvotes: 1
Reputation: 210822
assuming you have the following DF:
In [93]: df
Out[93]:
name purchase send_back
dates
2016-08-01 Michael 120 0
2016-08-02 Sarah 100 40
2016-08-04 Sarah 55 0
2016-08-05 Michael 80 20
2016-08-07 Sarah 130 0
you can resample and replace:
In [94]: df.resample('D').first().replace({'name':{np.nan:'-'}}).fillna(0)
Out[94]:
name purchase send_back
dates
2016-08-01 Michael 120.0 0.0
2016-08-02 Sarah 100.0 40.0
2016-08-03 - 0.0 0.0
2016-08-04 Sarah 55.0 0.0
2016-08-05 Michael 80.0 20.0
2016-08-06 - 0.0 0.0
2016-08-07 Sarah 130.0 0.0
Upvotes: 1