Reputation: 2049
Having a comma-separated file with around 50 columns and several rows, I need to remove all columns that are always 0 (i.e all values in that column are zero).
The file is read with the following piece of code:
with open('data.txt', 'rb') as f:
reader.csv.reader(f, delimiter=',')
for row in reader:
print row
0 0.1 0.3 0.4 0
0 0.2 0.5 0.3 0
0 0.7 0.9 0.2 0
How one can exactly remove columns (that are 0) from this memory structure. It would be more better, if there is no re-writing and re-reading to another temporary csv file to achieve this.
Upvotes: 1
Views: 272
Reputation: 1121924
Read in all rows (mapping all the values to floats), transform to columns using zip(*rows)
, only keep any that are have non-zero values using any()
, transform back to rows using zip(*columns)
:
with open('data.txt', 'rb') as f:
rows = list(map(float, row) for row in csv.reader(f, delimiter=','))
rows = zip(*[col for col in zip(*rows) if any(col)])
The latter step as a demonstration:
>>> rows = [[0, 0.1, 0.3, 0.4, 0], [0, 0.2, 0.5, 0.3, 0], [0, 0.7, 0.9, 0.2, 0]]
>>> zip(*[col for col in zip(*rows) if any(col)])
[(0.1, 0.3, 0.4), (0.2, 0.5, 0.3), (0.7, 0.9, 0.2)]
Upvotes: 1