tn3wman
tn3wman

Reputation: 21

Quickly save dataframe to mongodb with time based bucketing?

How do you save a pandas dataframe to mongodb using the recommended time based bucketing? The data in this case has an index with datetime, and columns with integers. I figured out how to create single documents for each timestamp, but can't figure out how to arrange the dataframe or loop through the dataframe to save a minutes worth of data it one document.

client = MongoClient('localhost', 27017)
db = client.testing

data_df = pd.read_pickle('fake_data.pkl')

for i, row in tqdm(enumerate(data_df.itertuples(), 1)):

    query = {'Timestamp': getattr(row, 'Index')}
    data = {'$set':
                {'Timestamp': getattr(row, 'Index'),
                 'A': getattr(row, 'A'),
                 'B': getattr(row, 'B'),
                 'C': getattr(row, 'C'),
                 'D': getattr(row, 'B')
                 }
            }

    db.single_doc_collection.update_one(query, data, upsert=True)

Upvotes: 1

Views: 81

Answers (1)

tn3wman
tn3wman

Reputation: 21

Here is my own solution. It works, just slow for large datasets. Hoping someone has a better solution.

client = MongoClient('localhost', 27017)
db = client.testing

db.time_bucket_collection.drop()
df = pd.read_pickle('fake_data.pkl')

start = df.index.min()
step = timedelta(seconds=60)
end = df.index.max()

while start <= end:

    df_slice = df[(df.index >= start) & (df.index < start + step)]
    df_slice.index = df_slice.index.strftime('%S')

    for i, row in tqdm(enumerate(df_slice.itertuples(), 1)):
        query = {'Timestamp': start}
        data = {
            'A': getattr(row, 'A'),
            'B': getattr(row, 'B'),
            'C': getattr(row, 'C'),
            'D': getattr(row, 'B')
        }

        doc = {
                '$set': {'Timestamp': start},
                '$push': {'Data': data},
                '$inc': {'Data Points': 1}
                }

        db.time_bucket_collection.update_one(query, doc, upsert=True)
    start += step

Upvotes: 1

Related Questions