Reputation: 1161
I realize this question has been asked a million times and there is a lot of documentation on it. However, I am unable to output the results in the correct format.
The below code was adopted from: Replacing empty csv column values with a zero
# Save below script as RepEmptyCells.py
# Add #!/usr/bin/python to script
# Make executable by chmod +x prior to running the script on desired .csv file
# Below code will look through your .csv file and replace empty spaces with 0s
# This can be particularly useful for genetic distance matrices
import csv
import sys
reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
for i, x in enumerate(row):
if len(x)< 1:
x = row[i] = 0
print(','.join(int(x) for x in row))
Currently, to get the correct output .csv file [i.e. in correct format] one can run the following command in bash:
#After making the script executable
./RepEmptyCells.py input.csv > output.csv # this produces the correct output
I've tried to use csv.writer
function to produce the correctly formatted output.csv
file (similar to ./RepEmptyCells.py input.csv > output.csv
) without much luck.
I'd like to learn how to add this last part to the code to automate the process without having to do it in bash.
What I have tried:
f = open(output2.csv, 'w')
import csv
import sys
reader = csv.reader(open(sys.argv[1], "rb"))
for row in reader:
for i, x in enumerate(row):
if len(x)< 1:
x = row[i] = 0
f.write(','.join(int(x) for x in row))
f.close()
When looking at the raw files from this code and the one before, they look the same.
However, when I open them in either excel or iNumbers the latter (i.e. output2.csv
) shows only a single row of the data.
Its important that both output.csv
and output2.csv
can be opened in excel.
Upvotes: 0
Views: 3197
Reputation: 60604
import csv
import sys
with open(sys.argv[1], 'rb') as f:
reader = csv.reader(f)
for row in reader:
print row.replace(' ', '0')
and I don't understand your need for using the shell and redirecting. a csv writer is just:
with open('output.csv', 'wb') as f:
writer = csv.writer(f)
writer.writerows(rows)
Upvotes: 0
Reputation: 8481
An humble proposition
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import csv
import sys
# Use with statement to properly close files
# Use newline='' which is the right option for Python 3.x
with open(sys.argv[1], 'r', newline='') as fin, open(sys.argv[2], 'w', newline='') as fout:
reader = csv.reader(fin)
# You may need to redefine the dialect for some version of Excel that
# split cells on semicolons (for _Comma_ Separated Values, yes...)
writer = csv.writer(fout, dialect="excel")
for row in reader:
# Write as reading, let the OS do the caching alone
# Process the data as it comes in a generator, checking all cells
# in a row. If cell is empty, the or will return "0"
# Keep strings all the time: if it's not an int it would fail
# Converting to int will force the writer to convert it back to str
# anwway, and Excel doesn't make any difference when loading.
writer.writerow( cell or "0" for cell in row )
Sample in.csv
1,2,3,,4,5,6,
7,,8,,9,,10
Output out.csv
1,2,3,0,4,5,6,0
7,0,8,0,9,0,10
Upvotes: 1
Reputation: 2373
2 options:
Just do a f.write('\n')
after your current f.write
statement.
Use csv.writer
. You mention it but it isn't in your code.
writer = csv.writer(f)
...
writer.writerow([int(x) for x in row]) # Note difference in parameter format
Upvotes: 3