Reputation: 21
Hello i am attempting to adjust a CSV file using Python but my out put is a little off and I can't figure out why.
in_file = open(out, "rb")
fout = "DomainWatchlist.csv"
fin_out_file = open(fout, "wb")
csv_writer2 = csv.writer(fin_out_file, quoting=csv.QUOTE_MINIMAL)
for item in in_file:
if "[.]" in item:
csv_writer2.writerow([item.replace("[.]", ".")])
elif "[dot]" in item:
csv_writer2.writerow([item.replace("[dot]", ".")])
else:
csv_writer2.writerow([item])
in_file.close
fin_out_file.close
The input file contains data that looks like this:
bluecreatureoftheseas.com
12rafvwe[dot]co[dot]cc
12rafvwe[dot]co[dot]cc
404page[dot]co[dot]cc
abalamahala[dot]co[dot]cc
abtarataha[dot]co[dot]cc
adoraath[dot]cz[dot]cc
adoranaya[dot]cz[dot]cc
afnffnjq[dot]co[dot]cc
aftermorningstar[dot]co[dot]cc
I am attempting to fix this data but it comes out looking like this:
"12rafvwe.co.cc
"
"12rafvwe.co.cc
"
"404page.co.cc
"
"abalamahala.co.cc
"
"abtarataha.co.cc
"
"adoraath.cz.cc
"
"adoranaya.cz.cc
"
"afnffnjq.co.cc
"
"aftermorningstar.co.cc
"
"aftrafsudalitf.co.cc
"
"agamafym.cz.cc
"
"agamakus.vv.cc
Why does this create the extra quotes and then add a carriage return?
Upvotes: 2
Views: 302
Reputation: 365717
The reason you're getting a newline is that for item in in_file:
iterates over each line in in_file
, without stripping the newline. You don't strip the newline anywhere. So it's still there in the single string in the list you pass to writerow
.
The reason you're getting quotes is that in CSV, strings with special characters—like newlines—have to be either escaped or quoted. There are different "dialect options" you can set to control that, but by default, it tries to use quoting instead of escaping.
So, the solution is something like this:
for item in in_file:
item = item.rstrip()
# rest of your code
There are some other problems with your code, as well as some ways you're making things more complicated than they need to be.
First, in_file.close
does not close the file. You're not calling the function, just referring to it as a function object. You need parentheses to call a function in Python.
But an even simpler way to handle closing files is to use a with
statement.
You only have a single column, so there is no need to use the csv
module at all. Just fin_out_file.write
would work just fine.
You also probably don't want to use binary mode here. If you have a good reason for doing so, that's fine, but if you don't know why you're using it, don't use it.
You don't need to check whether a substring exists before replace
-ing it. If you call 'abc'.replace('n', 'N')
, it will just harmlessly return 'abc'
. All you're doing is writing twice as much code, and making Python search each string twice in a row.
Putting this all together, here's the whole thing in three lines:
with open(out) as in_file, open(fout, 'w') as out_file:
for line in in_file:
out_file.write(line.replace("[.]", ".").replace("[dot]", "."))
Upvotes: 3
Reputation: 1056
a bit OT but perl was built for this
$ perl -i -ple 's/\[dot\]/./g' filename
will do the job, including saving the new file on the oldfilename.
Upvotes: 0