Costa.Gustavo
Costa.Gustavo

Reputation: 859

Formatting csv file with python

I have a csv file with the following structure:

"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE;1;1;2015;PP"

I need him to stay like this:

"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE";"1";"1";"2015";"PP"

I received this .csv file from someone else, so I do not know how the conversion was done. I am trying unsuccessfully with the code below:

input_fd = open("/home/gustavo/Downloads/Redes/Despesas/csvfile.csv", 'r')
output_fd = open('dados_2018_1.csv', 'w')
for line in input_fd.readlines():
    line.replace("\"","")
    output_fd.write(line)
    input_fd.close()
output_fd.close()

Is it possible to make this change or will I have to do the conversion from an xml file to a csv, and make this change during the conversion?

Upvotes: 0

Views: 107

Answers (4)

David Zemens
David Zemens

Reputation: 53663

First: tell the reader to use delimiter=";" and quoting=csv.QUOTE_NONE. This will properly split your second line which is a string literal containing your delimiter, which you desire to be split. We'll tweak that data to remove the quotation marks (otherwise our output will be quoted strings like '"txNomeParlamentar"', etc).

import csv
with open('file.txt') as f:
     reader = csv.reader(f, delimiter=";", quoting=csv.QUOTE_NONE)
     data = [list(map(lambda s: s.replace('"', ''), row)) for row in reader]

Then: we write the file back out, with the delimiter=";", and quoting=csv.QUOTE_ALL to ensure each item is set in quotes

with open('out.txt', 'w', newline='') as o:
     writer = csv.writer(o, delimiter=";", quoting=csv.QUOTE_ALL)
     writer.writerows(data)

Input:

"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE;1;1;2015;PP"

enter image description here

Output:

"txNomeParlamentar";"ideCadastro";"nuCarteiraParlamentar";"nuLegislatura";"sgUF"
"AVANTE";"1";"1";"2015";"PP"

enter image description here

Upvotes: 1

Rakesh
Rakesh

Reputation: 82785

Using csv module.

Ex:

import csv

with open(filename) as csvfile:
    reader = csv.reader(csvfile, delimiter=";")
    headers = next(reader)    #Read Headers
    data = [row.strip('"').split(";") for row in csvfile]    #Format data

with open(filename, "w") as csvfile_out:
    writer = csv.writer(csvfile_out, delimiter=";")
    writer.writerow(headers)   #Write Headers
    writer.writerows(data)     #Write data

Upvotes: 1

martineau
martineau

Reputation: 123541

You could use the csv module to do it if you massage the input data a little first.

import csv


#input_csv = '/home/gustavo/Downloads/Redes/Despesas/csvfile.csv'
input_csv = 'gustavo_input.csv'
output_csv = 'dados_2018_1.csv'

with open(input_csv, 'r', newline='') as input_fd, \
     open(output_csv, 'w', newline='') as output_fd:

    reader = csv.DictReader(input_fd, delimiter=';')
    writer = csv.DictWriter(output_fd, delimiter=';',
                            fieldnames=reader.fieldnames,
                            quoting=csv.QUOTE_ALL)

    first_field = reader.fieldnames[0]
    for row in reader:
        fields = row[first_field].split(';')
        newrow = dict(zip(reader.fieldnames, fields))
        writer.writerow(newrow)

print('done')

Upvotes: -1

AirSquid
AirSquid

Reputation: 11938

A couple things. First, you do NOT have a csv file because in a csv file, the delimiter is a comma by definition. I'm assuming you want the values in your data file to (1) remain separated by semicolons [why not fix it and make it commas?] and (2) you want each value to be in quotation marks.

If so, I think this will work:

# data reader

in_file = 'data.txt'
out_file = 'fixed.txt'
output = open(out_file, 'w')
with open(in_file, 'r') as source:
    for line in source:
        # split by semicolon
        data = line.strip().split(';')             
        # remove all quotes found
        data = [t.replace('"','') for t in data]   
        for item in data[:-1]:
            output.write(''.join(['"', item, '"',';']))
        # write the last item separately, without the trailing ';'
        output.write(''.join(['"', item, '"']))
        output.write('\n')
output.close()

If your target user is python, you should consider replacing the semicolons with commas (correct csv format) and forgoing the quotes. Everything python reads from csv is taken in as string anyhow.

Upvotes: 1

Related Questions