Reputation: 906

Python: Read CSV and write to file using a custom format

I have this .csv file ...

id,first_name,last_name,email,date,opt-in
1,Jimmy,Reyes,[email protected],12/29/2016,FALSE
2,Doris,Wood,[email protected],04/22/2016,
3,Steven,Miller,[email protected],07/31/2016,FALSE
4,Earl,Parker,[email protected],01-08-17,FALSE
5,Barbara,Cruz,[email protected],12/30/2016,FALSE

I want to read the above shown csv file, transform data, and finally write the data in another text file, which should look like this ....

1,<tab>"first_name"="Jimmy","last_name"="Reyes","email"="[email protected]","date"="12/29/2016","opt-in"="FALSE"
2,<tab>"first_name"="Doris","last_name"="Wood","email"="[email protected]","date"="04/22/2016,,"opt-in"="0"

Also, If the opt-in value is empty, its should print "0".

Here is my code so far ....

import csv
import time

# Do the reading
with open('my-scripts/mock.csv', 'r') as f1:
 #next(f1, None)  # skip the headers
 reader = csv.reader(f1)
 new_rows_list = []
 for row in reader:
   if row[5] == '':
      new_row = [row[0],'\t',row[1], row[2], row[3], row[4], '0']
      new_rows_list.append(new_row)
   else:
      new_row = [row[0],'\t',row[1], row[2], row[3], row[4], row[5]]
      new_rows_list.append(new_row)   
 f1.close()   # <---IMPORTANT

# Do the writing
newfilename = 'my-scripts/ftp_745198_'+str(int(time.time()))
with open(newfilename, 'w', newline='') as f2:
 writer = csv.writer(f2, quoting=csv.QUOTE_NONNUMERIC)
 writer.writerows(new_rows_list)
 f2.close()

The above code is generating this output, which is not what I exactly want ... I am unable to figure out how to print column names in each row as shown above in the desired output ...!

"id","  ","first_name","last_name","email","date","opt-in"
"1","   ","Jimmy","Reyes","[email protected]","12/29/2016","FALSE"
"2","   ","Doris","Wood","[email protected]","04/22/2016","0"
"3","   ","Steven","Miller","[email protected]","07/31/2016","FALSE"
"4","   ","Earl","Parker","[email protected]","01-08-17","FALSE"
"5","   ","Barbara","Cruz","[email protected]","12/30/2016","FALSE"

New CSV

id,first_name,last_name,email,date,opt-in,unique_code
1,Jimmy,Reyes,[email protected],12/29/2016,FALSE,ER45DH
2,Doris,Wood,[email protected],04/22/2016,,MU34T3
3,Steven,Miller,[email protected],07/31/2016,FALSE,G34FGH
4,Earl,Parker,[email protected],01-08-17,FALSE,ASY67J
5,Barbara,Cruz,[email protected],12/30/2016,FALSE,NHG67P

New expected output

ER45DH<tab>"id"="1","first_name"="Jimmy","last_name"="Reyes","email"="[email protected]","date"="12/29/2016","opt-in"="FALSE"
MU34T3<tab>"id"="2","first_name"="Doris","last_name"="Wood","email"="[email protected]","date"="04/22/2016,"opt-in"="0"

I will really appreciate any help/ideas/pointers.

Thanks

Upvotes: 1

Answers (3)

Karthikeyan KR

Reputation: 1174

Initially extract the header as a new list.
Then append header with each row elements as a string.
Write it to the file.

Please try this code,

import csv

with open('newfilename.csv', 'w') as f2:
    with open('mycsvfile.csv', mode='r') as infile:
        reader = csv.reader(infile)
        for i,rows in enumerate(reader):
            if i == 0:
               header = rows 
            else:
                if rows[5] == '':
                   rows[5] = 0;
                pat = rows[0]+'\t'+'''"%s=%%s",'''*(len(header)-1)+'\n'
                print pat
                f2.write(pat % tuple(header[1:]) % tuple(rows[1:]))
    f2.close()

Output:

1   "first_name=Jimmy","last_name=Reyes","[email protected]","date=12/29/2016","opt-in=FALSE",
2   "first_name=Doris","last_name=Wood","[email protected]","date=04/22/2016","opt-in=0",
3   "first_name=Steven","last_name=Miller","[email protected]","date=07/31/2016","opt-in=FALSE",
4   "first_name=Earl","last_name=Parker","[email protected]","date=01-08-17","opt-in=FALSE",
5   "first_name=Barbara","last_name=Cruz","[email protected]","date=12/30/2016","opt-in=FALSE",

Please let me know in terms of any queries.

Upvotes: 1

Zealseeker

Reputation: 823

Firstly, save the header into a variable. For example:

for i,row in enumerate(reader):
    if i == 0:
        header = row
    else:
        new_row = [row[0],'\t'] + ['%s=%s' % (header[j],row[j]) for j in range(1,6)]
        ....
...

Secondly, code such as [row[1], row[2], row[3], row[4], row[5]] can be simplified into [row[i] for i in range(1,6)] (generator)

Thridly, format is a good tool: print('"%s"="%s"'% (header[1],row[1])) will output "first_name"="Jimmy"

Use this knowledge and consider how to make it work.

Upvotes: 1

Gang YIN

Reputation: 2567

You could keep the header in a list, then use the the list (like first_name, etc.) to match the elements in the followed lines (like Jimmy, etc.) to generate the output you want ("first_name"="Jimmy").

Upvotes: 1

Python: Read CSV and write to file using a custom format

Answers (3)

Related Questions