Reputation: 6443
I have a csv file on with millions of rows. I used to create a dictionary out the csv file like this
with open('us_db.csv', 'rb') as f:
data = csv.reader(f)
for row in data:
Create Dictionary based on a column
Now to filter the rows based on some conditions I use pandas Dataframe as it is super fast in these operations. I load the csv as pandas Dataframe do some filtering. Then I want to continue doing the above. I thought of using pandas df.iterrows() or df.itertuples() but it is really slow.
Is there a way to convert the pandas dataframe to csv.reader() directly so that I can continue to use the above code. If I use csv_rows = to_csv()
, it gives a long string. Ofcourse, I can write out a csv and then read from it again. But I want to know if there is a way to skip the extra read and write to a file.
Upvotes: 7
Views: 9683
Reputation: 34
Why don't you apply the Create Dictionary function to the target column? Something like:
df['column_name'] = df['column_name'].apply(Create Dictionary)
Upvotes: 0
Reputation: 8483
You could do something like this..
import numpy as np
import pandas as pd
from io import StringIO
import csv
#random dataframe
df = pd.DataFrame(np.random.randn(3,4))
buffer = StringIO() #creating an empty buffer
df.to_csv(buffer) #filling that buffer
buffer.seek(0) #set to the start of the stream
for row in csv.reader(buffer):
#do stuff
Upvotes: 14