Reputation: 198
I have a pandas dataframe that includes timestamps, id, products, price and with more than 50+ columns.
I'd like to convert this data frame to a streaming data frame. For example, every 10 seconds, I'd like to receive 10 raws or 1 raw then after next 10 raws or 1 raw until the data frame ends.
I had a look to streamz library but couldn't find a proper function for this.
In this way, I am planning to apply some visualisation, and do some functional aggregations or further analysis.
>>>df.head()
Upvotes: 0
Views: 1975
Reputation: 188
Posting this small solution to your question.
import pandas as pd
import schedule
df = pd.read_csv('file.csv', iterator=True, chunksize=2)
def get_next_row():
row = next(df)
print(row)
# do_some_thing_with_row(row)
schedule.every(5).seconds.do(get_next_row)
while True:
try:
schedule.run_pending()
except StopIteration as e:
print("EOF")
break
The above code basically calls get_next_row function and read every 2 rows in 5 seconds of interval and prints the rows. Instead of printing you can add your functionality. Once it reach EOF, it will throw StopIteraton exception.
Now you can play around interval and chunk size to suit your requirement.
Upvotes: 2
Reputation: 115
Previously I have gotten around a similar problem by using pd.date_range()
to create times with the desired interval, then slicing the original dataframe by the times in the range.
For example.
times = pd.date_range(start=13:00, end=15:00, freq=T)
for t in times:
df_instance = df[df["Time"]<t]
Do something
Upvotes: 1