user3038303
user3038303

Reputation: 21

Python: replacing data in a CSV file

Hello i am attempting to adjust a CSV file using Python but my out put is a little off and I can't figure out why.

in_file = open(out, "rb")
fout = "DomainWatchlist.csv"
fin_out_file = open(fout, "wb")
csv_writer2 = csv.writer(fin_out_file, quoting=csv.QUOTE_MINIMAL)
for item in in_file:
if "[.]" in item:
    csv_writer2.writerow([item.replace("[.]", ".")])
elif "[dot]" in item:
    csv_writer2.writerow([item.replace("[dot]", ".")])
else:
    csv_writer2.writerow([item])

in_file.close
fin_out_file.close

The input file contains data that looks like this:

bluecreatureoftheseas.com
12rafvwe[dot]co[dot]cc
12rafvwe[dot]co[dot]cc
404page[dot]co[dot]cc
abalamahala[dot]co[dot]cc
abtarataha[dot]co[dot]cc
adoraath[dot]cz[dot]cc
adoranaya[dot]cz[dot]cc
afnffnjq[dot]co[dot]cc
aftermorningstar[dot]co[dot]cc

I am attempting to fix this data but it comes out looking like this:

"12rafvwe.co.cc
"
"12rafvwe.co.cc
"
"404page.co.cc
"
"abalamahala.co.cc
"
"abtarataha.co.cc
"
"adoraath.cz.cc
"
"adoranaya.cz.cc
"
"afnffnjq.co.cc
"
"aftermorningstar.co.cc
"
"aftrafsudalitf.co.cc
"
"agamafym.cz.cc
"
"agamakus.vv.cc

Why does this create the extra quotes and then add a carriage return?

Upvotes: 2

Views: 302

Answers (2)

abarnert
abarnert

Reputation: 365717

The reason you're getting a newline is that for item in in_file: iterates over each line in in_file, without stripping the newline. You don't strip the newline anywhere. So it's still there in the single string in the list you pass to writerow.

The reason you're getting quotes is that in CSV, strings with special characters—like newlines—have to be either escaped or quoted. There are different "dialect options" you can set to control that, but by default, it tries to use quoting instead of escaping.

So, the solution is something like this:

for item in in_file:
    item = item.rstrip()
    # rest of your code

There are some other problems with your code, as well as some ways you're making things more complicated than they need to be.

First, in_file.close does not close the file. You're not calling the function, just referring to it as a function object. You need parentheses to call a function in Python.

But an even simpler way to handle closing files is to use a with statement.

You only have a single column, so there is no need to use the csv module at all. Just fin_out_file.write would work just fine.

You also probably don't want to use binary mode here. If you have a good reason for doing so, that's fine, but if you don't know why you're using it, don't use it.

You don't need to check whether a substring exists before replace-ing it. If you call 'abc'.replace('n', 'N'), it will just harmlessly return 'abc'. All you're doing is writing twice as much code, and making Python search each string twice in a row.

Putting this all together, here's the whole thing in three lines:

with open(out) as in_file, open(fout, 'w') as out_file:
    for line in in_file:
        out_file.write(line.replace("[.]", ".").replace("[dot]", "."))

Upvotes: 3

vish
vish

Reputation: 1056

a bit OT but perl was built for this

$ perl -i -ple 's/\[dot\]/./g' filename

will do the job, including saving the new file on the oldfilename.

Upvotes: 0

Related Questions