Reputation: 21
How do you save a pandas dataframe to mongodb using the recommended time based bucketing? The data in this case has an index with datetime, and columns with integers. I figured out how to create single documents for each timestamp, but can't figure out how to arrange the dataframe or loop through the dataframe to save a minutes worth of data it one document.
client = MongoClient('localhost', 27017)
db = client.testing
data_df = pd.read_pickle('fake_data.pkl')
for i, row in tqdm(enumerate(data_df.itertuples(), 1)):
query = {'Timestamp': getattr(row, 'Index')}
data = {'$set':
{'Timestamp': getattr(row, 'Index'),
'A': getattr(row, 'A'),
'B': getattr(row, 'B'),
'C': getattr(row, 'C'),
'D': getattr(row, 'B')
}
}
db.single_doc_collection.update_one(query, data, upsert=True)
Upvotes: 1
Views: 81
Reputation: 21
Here is my own solution. It works, just slow for large datasets. Hoping someone has a better solution.
client = MongoClient('localhost', 27017)
db = client.testing
db.time_bucket_collection.drop()
df = pd.read_pickle('fake_data.pkl')
start = df.index.min()
step = timedelta(seconds=60)
end = df.index.max()
while start <= end:
df_slice = df[(df.index >= start) & (df.index < start + step)]
df_slice.index = df_slice.index.strftime('%S')
for i, row in tqdm(enumerate(df_slice.itertuples(), 1)):
query = {'Timestamp': start}
data = {
'A': getattr(row, 'A'),
'B': getattr(row, 'B'),
'C': getattr(row, 'C'),
'D': getattr(row, 'B')
}
doc = {
'$set': {'Timestamp': start},
'$push': {'Data': data},
'$inc': {'Data Points': 1}
}
db.time_bucket_collection.update_one(query, doc, upsert=True)
start += step
Upvotes: 1