Reputation: 25
I have multiple .csv files and I combined them in single .csv file with the use of python programming.
Now I need to automate the process of replacing the content of one column in a .csv file with the use of python. I can also open the .csv file using Notepad and replace the content of the column but the file is very huge and it is taking a long time.
Name ID class Num
"kanika",""University ISD_po.log";" University /projects/asd/new/high/sde"","MBA","12"
"Ambika",""University ISD_po.log";" University /projects/asd/new/high/sde"","MS","13"
In the above, I need to replace the content of ID column. The new content in the ID column should be "input".
This Id column is enclosed with 2 double quotes and has some extra spaces as well. Whereas other columns have only 1 double quote.
Is there any way to do it in python?
To combine multiple .csv files, the code is:
fout=open("out.csv","a")
for line in open("sh1.csv"):
fout.write(line)
for num in range(2,21):
f=open("sh"+str(num)+".csv")
f.next()
for line in f:
fout.write(line)
f.close()
fout.close()
Upvotes: 0
Views: 2634
Reputation: 110811
As other people have indicated, normally one does use the csv
module to read/write a CSv file from Python.
However, if the file you are mentioning is just like you posted, it is not well formed, and python's CSV won't be able to deal with it properly - (bad usage of double quotes on the column you want to change).
Therefore it is worth treating your file as a text file, and make the changes in there:
with open("myfile.csv") as input_file:
with open("output.csv", "wt") as output:
output.write(input_file.readline())
for line in input_file:
parts = line.split('""')
id = parts.split('"')[-1]
output.write(parts[0] + id + parts[2])
Upvotes: 4
Reputation: 27088
You just want to remove ""Uni- and " followed by a space.
change your code to
for line in f:
line=line.replace('""Uni-','').replace('" ','')
fout.write(line)
you get for example
kanika "38447484" MBA
Upvotes: 0
Reputation: 3443
You could use a regular expression to remove it:
In [3]: re.sub(r'""Uni-\s*"([0-9]+)""', r'\1', '""Uni- "38447484""', flags=re.I)
Out[3]: '38447484'
Upvotes: 0