kanika
kanika

Reputation: 25

Replacing the content of a column in python

I have multiple .csv files and I combined them in single .csv file with the use of python programming.

Now I need to automate the process of replacing the content of one column in a .csv file with the use of python. I can also open the .csv file using Notepad and replace the content of the column but the file is very huge and it is taking a long time.

Name                          ID                                                class  Num
"kanika",""University ISD_po.log";" University     /projects/asd/new/high/sde"","MBA","12"
"Ambika",""University ISD_po.log";" University     /projects/asd/new/high/sde"","MS","13"

In the above, I need to replace the content of ID column. The new content in the ID column should be "input".

This Id column is enclosed with 2 double quotes and has some extra spaces as well. Whereas other columns have only 1 double quote.

Is there any way to do it in python?

To combine multiple .csv files, the code is:

fout=open("out.csv","a")
for line in open("sh1.csv"):
    fout.write(line)
for num in range(2,21):
    f=open("sh"+str(num)+".csv")
    f.next()
    for line in f:
        fout.write(line)
    f.close()
fout.close()

Upvotes: 0

Views: 2634

Answers (4)

jsbueno
jsbueno

Reputation: 110811

As other people have indicated, normally one does use the csv module to read/write a CSv file from Python.

However, if the file you are mentioning is just like you posted, it is not well formed, and python's CSV won't be able to deal with it properly - (bad usage of double quotes on the column you want to change).

Therefore it is worth treating your file as a text file, and make the changes in there:

with open("myfile.csv") as input_file:
   with open("output.csv", "wt") as output:
      output.write(input_file.readline())
      for line in input_file:
           parts = line.split('""')
           id = parts.split('"')[-1]
           output.write(parts[0] + id + parts[2])

Upvotes: 4

Johan Lundberg
Johan Lundberg

Reputation: 27088

You just want to remove ""Uni- and " followed by a space.

change your code to

for line in f:
    line=line.replace('""Uni-','').replace('" ','')
    fout.write(line)

you get for example

kanika "38447484" MBA

Upvotes: 0

Gandaro
Gandaro

Reputation: 3443

You could use a regular expression to remove it:

In [3]: re.sub(r'""Uni-\s*"([0-9]+)""', r'\1', '""Uni-  "38447484""', flags=re.I)
Out[3]: '38447484'

Upvotes: 0

MRAB
MRAB

Reputation: 20664

Try Python's csv module to read and write CSV files.

Upvotes: 2

Related Questions