Reputation: 133

Using the split function in Python

I am working with the CSV module, and I am writing a simple program which takes the names of several authors listed in the file, and formats them in this manner: john.doe

So far, I've achieved the results that I want, but I am having trouble with getting the code to exclude titles such as "Mr."Mrs", etc. I've been thinking about using the split function, but I am not sure if this would be a good use for it.

Any suggestions? Thanks in advance!

Here's my code so far:

import csv


books = csv.reader(open("books.csv","rU"))


for row in books:


     print '.'.join ([item.lower() for item in [row[index] for index in (1, 0)]])

Upvotes: 2

Answers (2)

redShadow

Reputation: 6777

It depends on how much messy the strings are, in worst cases this regexp-based solution should do the job:

import re
x=re.compile(r"^\s*(mr|mrs|ms|miss)[\.\s]+", flags=re.IGNORECASE)
x.sub("", text)

(I'm using re.compile() here since for some reasons Python 2.6 re.sub doesn't accept the flags= kwarg..)

UPDATE: I wrote some code to test that and, although I wasn't able to figure out a way to automate results checking, it looks like that's working fine.. This is the test code:

import re
x=re.compile(r"^\s*(mr|mrs|ms|miss)[\.\s]+", flags=re.IGNORECASE)
names = ["".join([a,b,c,d]) for a in ['', ' ', '   ', '..', 'X'] for b in ['mr', 'Mr', 'miss', 'Miss', 'mrs', 'Mrs', 'ms', 'Ms'] for c in ['', '.', '. ', ' '] for d in ['Aaaaa', 'Aaaa Bbbb', 'Aaa Bbb Ccc', ' aa ']]
print "\n".join([" => ".join((n,x.sub('',n))) for n in names])

Upvotes: 3

Tom Neyland

Reputation: 6968

Depending on the complexity of your data and the scope of your needs you may be able to get away with something as simple as stripping titles from the lines in the csv using replace() as you iterate over them.

Something along the lines of:

titles = ["Mr.", "Mrs.", "Ms", "Dr"] #and so on

for line in lines:
    line_data = line
    for title in titles:
        line_data = line_data.replace(title,"")
    #your code for processing the line

This may not be the most efficient method, but depending on your needs may be a good fit.

How this could work with the code you posted (I am guessing the Mr./Mrs. is part of column 1, the first name):

import csv

books = csv.reader(open("books.csv","rU"))

for row in books:
     first_name = row[1]
     last_name = row[0]
     for title in titles:
          first_name = first_name.replace(title,"")
     print '.'.(first_name, last_name)

Upvotes: 0

Using the split function in Python

Answers (2)

Related Questions