Reputation: 127

Make edits to the original csv file

I have three different columns in my csv file, with their respected values. Column B (Name column) in csv file has the values in all caps. I am trying to convert it into first letter caps but when I run the code it returns all the columns squished together and in quotes.

The Original File:

Company        Name                 Job Title
xxxxxx        JACK NICHOLSON        Manager
yyyyyy        BRAD PITT             Accountant

I am trying to do:

Company        Name                 Job Title
xxxxxx        Jack Nicholson        Manager
yyyyyy        Brad Pitt             Accountant

My code:

import csv


with open('C:\\Users\\Data.csv', 'rb') as f:
    reader = csv.reader(f, delimiter='\t')
    data = list(reader)

for item in data:
    if len(item) > 1:
        item[1] = item[1].title()

with open('C:\\Users\\Data.csv', 'wb') as f:
    writer = csv.writer(f, delimiter='\t')
    writer.writerows(data)

My result after I run the code is: Instead of returning three different columns and the second column adjusted with the title() syntax, it returns all the three columns squished together in just one column with quotes.

ï»¿"Company","Name","Job Title"
xxxxxx,"JACK NICHOLSON","Manager"
yyyyyy,"BRAD PITT","Accountant"

I do not know what is wrong with my snippet. The result has absurd markings in the beginning

Upvotes: 0

Answers (3)

Waylon Walker

Reputation: 563

A slight change to Mohammed's solution using read_fwf to simplify reading the file.

http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_fwf.html

import pandas as pd

df = pd.read_fwf('old_csv_file')
df.Name = df.Name.str.title()
df.to_csv('new_csv_file', index=False, sep='\t')

EDIT: Changed to use a string method over lambda. I prefer to use lambdas as a last result.

Upvotes: 1

raymelfrancisco

Reputation: 871

Take note that data stores each row as list containing one string only.

Having a length of 1, the statement inside this if block won't execute.

if len(item) > 1:
    item[1] = item[1].title()

Aside from that, reading and writing in binary format is unnecessary.

import csv

with open('C:\\Users\\Data.csv', 'r') as f:
    reader = csv.reader(f, delimiter='\t')
    data = list(reader)

for item in data[1:]:                          # excludes headers
    item[0] = item[0].title()                  # will capitalize the Company column too
    item[0] = item[0][0].lower() + item[0][1:] # that's why we need to revert

    print(item) 
    # see that data contains lists having one element only
    # the line above will output to
    #    ['Company        Name                 Job Title']
    #    ['xxxxxx        Jack Nicholson        Manager']
    #    ['yyyyyy        Brad Pitt             Accountant']


with open('C:\\Users\\Data.csv', 'w') as f:
    writer = csv.writer(f, delimiter='\t')
    writer.writerows(data)

Upvotes: 0

Mohammad Yusuf

Reputation: 17074

You can do something like this with pandas:

import pandas as pd

df = pd.read_csv('old_csv_file', sep='\s{3,}')
df.Name = df.Name.apply(lambda x: x.title())
df.to_csv('new_csv_file', index=False, sep='\t')

string.title() converts the string to title case, i.e every first letter of the word in string is capitalized and subsequent letters are converted to lower case.

With df.apply you can perform some operation on an entire column or row.

'\s{3,}' is a regular expression
\s is a space character. \s{3,} is for more than 3 spaces.

When you are reading a CSV format you have to specify how your columns are separated.
Generally columns are separated by comma or tab. But in your case you have like 5,6 spaces between each column of a row.
So by using \s{3,} I am telling the CSV processor that the columns in a row are delimited by more than 3 spaces.

If I had use only \s then it would have treated First Name and Last Name as two separate columns because they have 1 space in between. So by 3+ spaces I made First Name and Last Name as a single column.

Upvotes: 0

Make edits to the original csv file

Answers (3)

Related Questions