CHM
CHM

Reputation: 337

Python: General CSV file parsing and manipulation

The purpose of my Python script is to compare the data present in multiple CSV files, looking for discrepancies. The data are ordered, but the ordering differs between files. The files contain about 70K lines, weighing around 15MB. Nothing fancy or hardcore here. Here's part of the code:

def getCSV(fpath):
    with open(fpath,"rb") as f:
        csvfile = csv.reader(f)

        for row in csvfile:
            allRows.append(row)

allCols = map(list, zip(*allRows))

Upvotes: 3

Views: 1747

Answers (3)

Jon Clements
Jon Clements

Reputation: 142136

Are you sure you want to be keeping all rows around? This creates a list with matching values only... fname could also come from glob.glob() or os.listdir() or whatever other data source you so choose. Just to note, you mention the 20th column, but row[20] will be the 21st column...

import csv

matching20 = []

for fname in ('file1.csv', 'file2.csv', 'file3.csv'):
    with open(fname) as fin:
        csvin = csv.reader(fin)
        next(csvin) # <--- if you want to skip header row
        for row in csvin:
            if row[20] == 'value':
                matching20.append(row) # or do something with it here

You only want csv.DictReader if you have a header row and want to access your columns by name.

Upvotes: 2

aaronlevin
aaronlevin

Reputation: 1443

If I understand the question correctly, you want to include a row if value is in the row, but you don't know which column value is, correct?

If your rows are lists, then this should work:

testlist = [row for row in allRows if 'value' in row]

post-edit:

If, as you say, you want a list of rows where value is in a specified column (specified by an integer pos, then:

testlist = []
pos = 20
for row in allRows:
    testlist.append([element if index != pos else 'value' for index, element in enumerate(row)])

(I haven't tested this, but let me now if that works).

Upvotes: 1

Facundo Casco
Facundo Casco

Reputation: 10585

This should work, you don't need to make another list to have access to the columns.

import csv
import sys

def getCSV(fpath):
    with open(fpath) as ifile:
        csvfile = csv.reader(ifile)

        rows = list(csvfile)

    value_20 = [x for x in rows if x[20] == 'value']

Upvotes: 2

Related Questions