how can I convert a big pandas dataframe to a streaming data frame?

Question

I have a pandas dataframe that includes timestamps, id, products, price and with more than 50+ columns.

I'd like to convert this data frame to a streaming data frame. For example, every 10 seconds, I'd like to receive 10 raws or 1 raw then after next 10 raws or 1 raw until the data frame ends.

I had a look to streamz library but couldn't find a proper function for this.

In this way, I am planning to apply some visualisation, and do some functional aggregations or further analysis.

>>>df.head()

suraj deshmukh · Accepted Answer

Posting this small solution to your question.

import pandas as pd
import schedule

df = pd.read_csv('file.csv', iterator=True, chunksize=2)

def get_next_row():
    row = next(df)
    print(row)
    # do_some_thing_with_row(row)

schedule.every(5).seconds.do(get_next_row)

while True:
    try:
        schedule.run_pending()
    except StopIteration as e:
        print("EOF")
        break

The above code basically calls get_next_row function and read every 2 rows in 5 seconds of interval and prints the rows. Instead of printing you can add your functionality. Once it reach EOF, it will throw StopIteraton exception.

Now you can play around interval and chunk size to suit your requirement.

how can I convert a big pandas dataframe to a streaming data frame?

Answers (2)

Related Questions