Removing columns in python

Question

Having a comma-separated file with around 50 columns and several rows, I need to remove all columns that are always 0 (i.e all values in that column are zero).

The file is read with the following piece of code:

with open('data.txt', 'rb') as f:
    reader.csv.reader(f, delimiter=',')
    for row in reader:
        print row


0 0.1 0.3 0.4 0
0 0.2 0.5 0.3 0
0 0.7 0.9 0.2 0

How one can exactly remove columns (that are 0) from this memory structure. It would be more better, if there is no re-writing and re-reading to another temporary csv file to achieve this.

Martijn Pieters · Accepted Answer

Read in all rows (mapping all the values to floats), transform to columns using zip(*rows), only keep any that are have non-zero values using any(), transform back to rows using zip(*columns):

with open('data.txt', 'rb') as f:
    rows = list(map(float, row) for row in csv.reader(f, delimiter=','))

rows = zip(*[col for col in zip(*rows) if any(col)])

The latter step as a demonstration:

>>> rows = [[0, 0.1, 0.3, 0.4, 0], [0, 0.2, 0.5, 0.3, 0], [0, 0.7, 0.9, 0.2, 0]]
>>> zip(*[col for col in zip(*rows) if any(col)])
[(0.1, 0.3, 0.4), (0.2, 0.5, 0.3), (0.7, 0.9, 0.2)]

Removing columns in python

Answers (1)

Related Questions