Reputation: 127
I have three different columns in my csv file, with their respected values. Column B (Name column) in csv file has the values in all caps. I am trying to convert it into first letter caps but when I run the code it returns all the columns squished together and in quotes.
The Original File:
Company Name Job Title
xxxxxx JACK NICHOLSON Manager
yyyyyy BRAD PITT Accountant
I am trying to do:
Company Name Job Title
xxxxxx Jack Nicholson Manager
yyyyyy Brad Pitt Accountant
My code:
import csv
with open('C:\\Users\\Data.csv', 'rb') as f:
reader = csv.reader(f, delimiter='\t')
data = list(reader)
for item in data:
if len(item) > 1:
item[1] = item[1].title()
with open('C:\\Users\\Data.csv', 'wb') as f:
writer = csv.writer(f, delimiter='\t')
writer.writerows(data)
My result after I run the code is: Instead of returning three different columns and the second column adjusted with the title() syntax, it returns all the three columns squished together in just one column with quotes.
"Company","Name","Job Title"
xxxxxx,"JACK NICHOLSON","Manager"
yyyyyy,"BRAD PITT","Accountant"
I do not know what is wrong with my snippet. The result has absurd markings in the beginning
Upvotes: 0
Views: 100
Reputation: 563
A slight change to Mohammed's solution using read_fwf to simplify reading the file.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_fwf.html
import pandas as pd
df = pd.read_fwf('old_csv_file')
df.Name = df.Name.str.title()
df.to_csv('new_csv_file', index=False, sep='\t')
EDIT: Changed to use a string method over lambda. I prefer to use lambdas as a last result.
Upvotes: 1
Reputation: 871
Take note that data
stores each row as list
containing one string only.
Having a length of 1, the statement inside this if
block won't execute.
if len(item) > 1:
item[1] = item[1].title()
Aside from that, reading and writing in binary format is unnecessary.
import csv
with open('C:\\Users\\Data.csv', 'r') as f:
reader = csv.reader(f, delimiter='\t')
data = list(reader)
for item in data[1:]: # excludes headers
item[0] = item[0].title() # will capitalize the Company column too
item[0] = item[0][0].lower() + item[0][1:] # that's why we need to revert
print(item)
# see that data contains lists having one element only
# the line above will output to
# ['Company Name Job Title']
# ['xxxxxx Jack Nicholson Manager']
# ['yyyyyy Brad Pitt Accountant']
with open('C:\\Users\\Data.csv', 'w') as f:
writer = csv.writer(f, delimiter='\t')
writer.writerows(data)
Upvotes: 0
Reputation: 17074
You can do something like this with pandas:
import pandas as pd
df = pd.read_csv('old_csv_file', sep='\s{3,}')
df.Name = df.Name.apply(lambda x: x.title())
df.to_csv('new_csv_file', index=False, sep='\t')
string.title() converts the string to title case, i.e every first letter of the word in string is capitalized and subsequent letters are converted to lower case.
With df.apply you can perform some operation on an entire column or row.
'\s{3,}' is a regular expression
\s is a space character. \s{3,} is for more than 3 spaces.
When you are reading a CSV format you have to specify how your columns are separated.
Generally columns are separated by comma or tab. But in your case you have like 5,6 spaces between each column of a row.
So by using \s{3,} I am telling the CSV processor that the columns in a row are delimited by more than 3 spaces.
If I had use only \s then it would have treated First Name and Last Name as two separate columns because they have 1 space in between. So by 3+ spaces I made First Name and Last Name as a single column.
Upvotes: 0