Reputation: 133
I am working with the CSV module, and I am writing a simple program which takes the names of several authors listed in the file, and formats them in this manner: john.doe
So far, I've achieved the results that I want, but I am having trouble with getting the code to exclude titles such as "Mr."Mrs", etc. I've been thinking about using the split function, but I am not sure if this would be a good use for it.
Any suggestions? Thanks in advance!
Here's my code so far:
import csv
books = csv.reader(open("books.csv","rU"))
for row in books:
print '.'.join ([item.lower() for item in [row[index] for index in (1, 0)]])
Upvotes: 2
Views: 346
Reputation: 6777
It depends on how much messy the strings are, in worst cases this regexp-based solution should do the job:
import re
x=re.compile(r"^\s*(mr|mrs|ms|miss)[\.\s]+", flags=re.IGNORECASE)
x.sub("", text)
(I'm using re.compile()
here since for some reasons Python 2.6 re.sub
doesn't accept the flags=
kwarg..)
UPDATE: I wrote some code to test that and, although I wasn't able to figure out a way to automate results checking, it looks like that's working fine.. This is the test code:
import re
x=re.compile(r"^\s*(mr|mrs|ms|miss)[\.\s]+", flags=re.IGNORECASE)
names = ["".join([a,b,c,d]) for a in ['', ' ', ' ', '..', 'X'] for b in ['mr', 'Mr', 'miss', 'Miss', 'mrs', 'Mrs', 'ms', 'Ms'] for c in ['', '.', '. ', ' '] for d in ['Aaaaa', 'Aaaa Bbbb', 'Aaa Bbb Ccc', ' aa ']]
print "\n".join([" => ".join((n,x.sub('',n))) for n in names])
Upvotes: 3
Reputation: 6968
Depending on the complexity of your data and the scope of your needs you may be able to get away with something as simple as stripping titles from the lines in the csv using replace() as you iterate over them.
Something along the lines of:
titles = ["Mr.", "Mrs.", "Ms", "Dr"] #and so on
for line in lines:
line_data = line
for title in titles:
line_data = line_data.replace(title,"")
#your code for processing the line
This may not be the most efficient method, but depending on your needs may be a good fit.
How this could work with the code you posted (I am guessing the Mr./Mrs. is part of column 1, the first name):
import csv
books = csv.reader(open("books.csv","rU"))
for row in books:
first_name = row[1]
last_name = row[0]
for title in titles:
first_name = first_name.replace(title,"")
print '.'.(first_name, last_name)
Upvotes: 0