user2820987
user2820987

Reputation: 55

python csv copy column

I have a file containing following

first_name,last_name,uid,email,dep_code,dep_name
john,smith,jsmith,[email protected],finance,21230
john,king,jking,[email protected],human resource,31230

I want to copy column "email" and create a new column "email2" and then replace gmail.com from column email2 to hotmail.com

I'm new to python so need help from experts, I tried few script, but if there is a better way to do it then please let me know. The original file contains 60000 rows.

with open('c:\\Python27\\scripts\\colnewfile.csv', 'rb') as fp_in1, open('c:\\Python27\\scripts\\final.csv', 'wb') as fp_out1:
    writer1 = csv.writer(fp_out1, delimiter=",")
    reader1 = csv.reader(fp_in1, delimiter=",")
    domain = "@hotmail.com"
    for row in reader1:
        if row[2:3] == "uid":
            writer1.append("Email2")
        else:
            writer1.writerow(row+[row[2:3]])

Here is the final script, only problem is that it does not complete the entire outfile, it only show 61409 rows, whereas in the input file there are 61438 rows.

inFile = 'c:\Python27\scripts\in-093013.csv' outFile = 'c:\Python27\scripts\final.csv'

with open(inFile, 'rb') as fp_in1, open(outFile, 'wb') as fp_out1: writer = csv.writer(fp_out1, delimiter=",") reader = csv.reader(fp_in1, delimiter=",") for col in reader: del col[6:] writer.writerow(col) headers = next(reader) writer.writerow(headers + ['email2']) for row in reader: if len(row) > 3: email = email.split('@', 1)[0] + '@hotmail.com' writer.writerow(row + [email])

Upvotes: 2

Views: 9148

Answers (2)

yardstick17
yardstick17

Reputation: 4592

This can be done very cleanly using pandas . Here it goes:

In [1]: import pandas as pd

In [3]: df = pd.read_csv('your_csv_file.csv')

In [4]: def rename_email(row):
   ...:     return row.email.replace('gmail.com', 'hotmail.com')
   ...:

In [5]: df['email2'] = df.apply(rename_email, axis=1)

In [6]: """axis = 1 or ‘columns’: apply function to each row"""

In [7]: df
Out[7]:
  first_name last_name     uid             email        dep_code  dep_name              email2
0       john     smith  jsmith  [email protected]         finance     21230  [email protected]
1       john      king   jking   [email protected]  human resource     31230   [email protected]

In [8]: df.to_csv('new_update_email_file.csv')

Upvotes: 1

Martijn Pieters
Martijn Pieters

Reputation: 1121366

If you call next() on the reader you get one row at at a time; use that to copy over the headers. Copying the email column is easy enough:

import csv

infilename = r'c:\Python27\scripts\colnewfile.csv'
outfilename = r'c:\Python27\scripts\final.csv'

with open(infilename, 'rb') as fp_in, open(outfilename, 'wb') as fp_out:
    reader = csv.reader(fp_in, delimiter=",")
    headers = next(reader)  # read first row

    writer = csv.writer(fp_out, delimiter=",")
    writer.writerow(headers + ['email2'])

    for row in reader:
        if len(row) > 3:
            # make sure there are at least 4 columns
            email = row[3].split('@', 1)[0] + '@hotmail.com'
        writer.writerow(row + [email])

This code splits the email address on the first @ sign, takes the first part of the split and adds @hotmail.com after it:

>>> '[email protected]'.split('@', 1)[0]
'example'
>>> '[email protected]'.split('@', 1)[0] + '@hotmail.com'
'[email protected]'

The above produces:

first_name,last_name,uid,email,dep_code,dep_name,email2
john,smith,jsmith,[email protected],finance,21230,[email protected]
john,king,jking,[email protected],human resource,31230,[email protected]

for your sample input.

Upvotes: 1

Related Questions