Reputation: 4692
I am using the following code for reading CSV file to a dictionary.
file_name = path+'/'+file.filename
with open(file_name, newline='') as csv_file:
csv_dict = [{k: v for k, v in row.items()}
for row in csv.DictReader(csv_file)]
for item in csv_dict:
call_api(item)
Now this is reading the files and calling the function for each of the row. As the number of rows increases, the number of calls also will increase. Also it is not possible to load all the contents to memory and split and call API from there as the size of the data is big. So I would like to follow an approach, so that the file will be read using limit
and offset
as in the case of SQL queries. But how can this be done in Python ? I am not seeing any option to specify the number of rows and skip rows in the csv documentation. Is someone can suggest a better approach also that will be fine.
Upvotes: 2
Views: 6678
Reputation: 1253
A solution could be to use pandas to read the csv:
import pandas as pd
file_name = 'data.csv'
OFFSET = 10
LIMIT = 24
CHSIZE = 6
header = list('ABC')
reader = pd.read_csv(file_name, sep=',',
header=None, names=header, # Header 'A', 'B', 'C'
usecols=[0, 1, 4], # Select some columns
skiprows=lambda idx: idx < OFFSET, # Skip lines
chunksize=CHSIZE, # Chunk reading
nrows=LIMIT)
for df_chunk in reader:
# Each df_chunk is a DataFrame, so
# an adapted api may be needed to
# call_api(item)
for row in df_chunk.itertuples():
print(row._asdict())
Upvotes: 1
Reputation: 51683
You can call your api directly just with 1 line in memory:
with open(file_name, newline='') as csv_file:
for row in csv.DictReader(csv_file):
call_api(row) # call api with row-dictionary, don't persist all to memory
You can skip lines using next(row)
before the for loop:
with open(file_name, newline='') as csv_file:
for _ in range(10): # skip first 10 rows
next(csv_file)
for row in csv.DictReader(csv_file):
You can skip lines in between using continue
:
with open(file_name, newline='') as csv_file:
for (i,row) in enumerate(csv.DictReader(csv_file)):
if i%2 == 0: continue # skip every other row
You can simply count parsed lines and break after n
lines are done:
n = 0
with open(file_name, newline='') as csv_file:
for row in csv.DictReader(csv_file):
if n == 50:
break
n += 1
and you can combine those approaches to skip 100 rows and take 200, only taking every 2th one - this mimics limit and offset and hacks using modulo on the line number.
Or you use something thats great with csv, like pandas:
Upvotes: 3