vrd.gn
vrd.gn

Reputation: 198

how can I convert a big pandas dataframe to a streaming data frame?

enter image description hereI have a pandas dataframe that includes timestamps, id, products, price and with more than 50+ columns.

I'd like to convert this data frame to a streaming data frame. For example, every 10 seconds, I'd like to receive 10 raws or 1 raw then after next 10 raws or 1 raw until the data frame ends.

I had a look to streamz library but couldn't find a proper function for this.

In this way, I am planning to apply some visualisation, and do some functional aggregations or further analysis.

>>>df.head()

Upvotes: 0

Views: 1975

Answers (2)

suraj deshmukh
suraj deshmukh

Reputation: 188

Posting this small solution to your question.

import pandas as pd
import schedule

df = pd.read_csv('file.csv', iterator=True, chunksize=2)

def get_next_row():
    row = next(df)
    print(row)
    # do_some_thing_with_row(row)

schedule.every(5).seconds.do(get_next_row)

while True:
    try:
        schedule.run_pending()
    except StopIteration as e:
        print("EOF")
        break

The above code basically calls get_next_row function and read every 2 rows in 5 seconds of interval and prints the rows. Instead of printing you can add your functionality. Once it reach EOF, it will throw StopIteraton exception.

Now you can play around interval and chunk size to suit your requirement.

Upvotes: 2

Stu
Stu

Reputation: 115

Previously I have gotten around a similar problem by using pd.date_range() to create times with the desired interval, then slicing the original dataframe by the times in the range.

For example.

times = pd.date_range(start=13:00, end=15:00, freq=T)
for t in times:
    df_instance = df[df["Time"]<t]
    Do something

Upvotes: 1

Related Questions