Reputation: 9638
My scenario: I am reading a csv file. I want to have access to both a dictionary of the fields generated by each line, and the raw, un-parsed line.
The goal is ultimately to do some processing on the fields, use the result to decide which lines I am interested in, and write those lines only into an output file.
An easy solution, involving reading the file twice looks something like:
def dict_and_row(filename):
with open(filename) as f:
tmp = [row for row in DictReader(f)]
with open(filename) as f:
next(f) # skip header
for i, line in enumerate(f):
if len(line.strip()) > 0:
yield line.strip(), tmp[i]
Any better suggestions?
Edit: to be more specific about the usage scenario. I intended to index the lines by some of the data in the dict, and then use this index to find lines I am interested in. Something like:
d = {}
for raw, parsed in dict_and_row(somefile):
d[(parsed["SOMEFIELD"], parsed ["ANOTHERFIELD"])] = raw
and then later on
for pair in some_other_source_of_pairs:
if pair in d:
output.write(d[pair])
Upvotes: 4
Views: 2781
Reputation: 9638
I ended up wrapping the file with an object that saves the last line read, and the handing this object to the DictReader.
class FileWrapper:
def __init__(self, f):
self.f = f
self.last_line = None
def __iter__(self):
return self
def __next__(self):
self.last_line = next(self.f)
return self.last_line
This could be then used this way:
f = FileWrapper(file_object)
for row in csv.DictReader(f):
print(row) # that's the dict
print(f.last_line) # that's the line
Or I can implement dict_and_row
:
def dict_and_row(filename):
with open(filename) as f:
wrapper = FileWrapper(f)
reader = DictReader(wrapper)
for row in reader:
yield row, wrapper.last_line
This also allows access to other properties such as the number of characters read.
Not sure that's the best solution but it does have the advantage of retaining access to strings as they were originally read from the file.
Upvotes: 8
Reputation: 3803
This is similar to something that I had to do at one point. I needed to put rows of properly-formatted CSV data into a list, manipulate it, and then save it. I used io.StringIO()
to get CSV to write to a list, then passed that back. Without your data, I can't be 100% certain, but this should work. Note that, rather than reading the file in twice, I'm reading it in once and then writing the relevant lines back into CSV format.
import csv
from io import StringIO
def dict_and_row(filename):
field_names = ['a', 'b'] # Your field names here.
output = StringIO(newline='\n')
with open(filename, 'r', newline='\n') as f:
writer = csv.DictWriter(output, fieldnames=field_names)
reader = csv.DictReader(f)
writer.writeheader() # If you want to return the header.
for line in reader:
if True: # Do your processing here...
writer.writerow(line)
data = [line.strip() for line in output.getvalue().splitlines()]
for line in data:
yield line
Upvotes: 1
Reputation: 12108
You could use Pandas which is an excellent library to do such kind of processing...
import pandas as pd
# read the csv file
data = pd.read_csv('data.csv')
# do some calculation on a column and store it in another column
data['column2'] = data['column1'] * 2
# If you decide that you need only a particular set of rows
# that match some condition of yours
data = data[data['column2'] > 100]
# store only particular columns back
cols = ['column1', 'column2', 'column3']
data[cols].to_csv('data_edited.csv')
Upvotes: 4