Reputation: 4266
I am reading in data from a MongoDB collection
mongo_url = 'mongodb://localhost:27017/db'
client = pymongo.MongoClient(mongo_url)
db = client.db
collection = db.coll
docs = list(collection.find({}, {"Date": 1, "Cost" : 1, "_id" : 0 }).sort("date", pymongo.ASCENDING))
so I end up with an list of dicts stored in docs, of the form
[{u'Date': u'2008-01-01', u'Cost': 8557.0}, {u'Date': u'2008-01-02', u'Cost': 62307.0},.....]
I can then create a DataFrame
from this
frame = DataFrame(docs)
which is of the form
but I want the Date column to be used as a DatetimeIndex
. I have been doing this in a pretty hacky way but I know there must be a cleaner way of doing this.
frame = frame.set_index(pd.to_datetime(frame['Date']))
Also, if I examine the index I see that the freq
is not present, so I want to try and set the daily frequency when creating the DataFrame
@jezrael replied with an excellent answer so I just wanted to explain here which approach worked for me.
If I try this
frame = DataFrame(docs)
frame.set_index('Date', inplace=True)
frame.index = pd.DatetimeIndex(frame.index, freq='D')
for some reason I get the following error
ValueError: Inferred frequency None from passed dates does notconform to passed frequency D
but the other suggestion works great for me.
idx = pd.DatetimeIndex([x['Date'] for x in docs], freq='D')
frame = DataFrame(docs, index=idx)
frame = frame.drop('Date', 1)
Upvotes: 2
Views: 684
Reputation: 863791
If need create Datetimindex
in DataFrame
constructor:
docs = [{u'Date': u'2008-01-01', u'Cost': 8557.0},{u'Date': u'2008-01-02', u'Cost': 62307.0}]
idx = pd.DatetimeIndex([x['Date'] for x in docs], freq='D')
print (idx)
DatetimeIndex(['2008-01-01', '2008-01-02'], dtype='datetime64[ns]', freq='D')
frame = pd.DataFrame(docs, index=idx)
print (frame)
Cost Date
2008-01-01 8557.0 2008-01-01
2008-01-02 62307.0 2008-01-02
print (frame.index)
DatetimeIndex(['2008-01-01', '2008-01-02'], dtype='datetime64[ns]', freq='D')
Another solution, if create DatetimeIndex
after creating DataFrame
:
You can use set_index
with DatetimeIndex
:
docs = [{u'Date': u'2008-01-01', u'Cost': 8557.0},{u'Date': u'2008-01-02', u'Cost': 62307.0}]
frame = pd.DataFrame(docs)
print (frame)
Cost Date
0 8557.0 2008-01-01
1 62307.0 2008-01-02
frame.set_index('Date', inplace=True)
frame.index = pd.DatetimeIndex(frame.index, freq='D')
print (frame)
Cost
2008-01-01 8557.0
2008-01-02 62307.0
print (frame.index)
DatetimeIndex(['2008-01-01', '2008-01-02'], dtype='datetime64[ns]', freq='D')
If need copy column Date
to index
:
docs = [{u'Date': u'2008-01-01', u'Cost': 8557.0},{u'Date': u'2008-01-02', u'Cost': 62307.0}]
frame = pd.DataFrame(docs)
print (frame)
Cost Date
0 8557.0 2008-01-01
1 62307.0 2008-01-02
frame.set_index(frame.Date, inplace=True)
frame.index = pd.DatetimeIndex(frame.index, freq='D')
print (frame)
Cost Date
2008-01-01 8557.0 2008-01-01
2008-01-02 62307.0 2008-01-02
print (frame.index)
DatetimeIndex(['2008-01-01', '2008-01-02'], dtype='datetime64[ns]', freq='D')
Upvotes: 2