Philip O'Brien
Philip O'Brien

Reputation: 4266

Use date field from MongoDB list as DatetimeIndex in Pandas DataFrame

I am reading in data from a MongoDB collection

mongo_url = 'mongodb://localhost:27017/db'
client = pymongo.MongoClient(mongo_url)
db = client.db
collection = db.coll
docs = list(collection.find({}, {"Date": 1, "Cost" : 1, "_id" : 0 }).sort("date", pymongo.ASCENDING))

so I end up with an list of dicts stored in docs, of the form

[{u'Date': u'2008-01-01', u'Cost': 8557.0}, {u'Date': u'2008-01-02', u'Cost': 62307.0},.....]

I can then create a DataFrame from this

frame = DataFrame(docs)

which is of the form

enter image description here

but I want the Date column to be used as a DatetimeIndex. I have been doing this in a pretty hacky way but I know there must be a cleaner way of doing this.

frame = frame.set_index(pd.to_datetime(frame['Date']))

enter image description here

Also, if I examine the index I see that the freq is not present, so I want to try and set the daily frequency when creating the DataFrame

enter image description here


Update

@jezrael replied with an excellent answer so I just wanted to explain here which approach worked for me.

If I try this

frame = DataFrame(docs)
frame.set_index('Date', inplace=True)
frame.index = pd.DatetimeIndex(frame.index, freq='D')

for some reason I get the following error

ValueError: Inferred frequency None from passed dates does notconform to passed frequency D

but the other suggestion works great for me.

idx =  pd.DatetimeIndex([x['Date'] for x in docs], freq='D')
frame = DataFrame(docs, index=idx)
frame = frame.drop('Date', 1)

Upvotes: 2

Views: 684

Answers (1)

jezrael
jezrael

Reputation: 863791

If need create Datetimindex in DataFrame constructor:

docs = [{u'Date': u'2008-01-01', u'Cost': 8557.0},{u'Date': u'2008-01-02', u'Cost': 62307.0}]

idx =  pd.DatetimeIndex([x['Date'] for x in docs], freq='D')
print (idx)
DatetimeIndex(['2008-01-01', '2008-01-02'], dtype='datetime64[ns]', freq='D')

frame = pd.DataFrame(docs, index=idx)
print (frame)
               Cost        Date
2008-01-01   8557.0  2008-01-01
2008-01-02  62307.0  2008-01-02

print (frame.index)
DatetimeIndex(['2008-01-01', '2008-01-02'], dtype='datetime64[ns]', freq='D')

Another solution, if create DatetimeIndex after creating DataFrame:

You can use set_index with DatetimeIndex:

docs = [{u'Date': u'2008-01-01', u'Cost': 8557.0},{u'Date': u'2008-01-02', u'Cost': 62307.0}]
frame = pd.DataFrame(docs)
print (frame)
      Cost        Date
0   8557.0  2008-01-01
1  62307.0  2008-01-02

frame.set_index('Date', inplace=True)
frame.index = pd.DatetimeIndex(frame.index, freq='D')
print (frame)
               Cost
2008-01-01   8557.0
2008-01-02  62307.0

print (frame.index)
DatetimeIndex(['2008-01-01', '2008-01-02'], dtype='datetime64[ns]', freq='D')

If need copy column Date to index:

docs = [{u'Date': u'2008-01-01', u'Cost': 8557.0},{u'Date': u'2008-01-02', u'Cost': 62307.0}]
frame = pd.DataFrame(docs)
print (frame)
      Cost        Date
0   8557.0  2008-01-01
1  62307.0  2008-01-02

frame.set_index(frame.Date, inplace=True)
frame.index = pd.DatetimeIndex(frame.index, freq='D')
print (frame)
               Cost        Date
2008-01-01   8557.0  2008-01-01
2008-01-02  62307.0  2008-01-02
print (frame.index)
DatetimeIndex(['2008-01-01', '2008-01-02'], dtype='datetime64[ns]', freq='D')

Upvotes: 2

Related Questions