Reputation: 1783
I'm new to Python, and I've been playing around with it for simple tasks. I have a bunch of CSVs which I need to manipulate in complex ways, but I'm breaking this up into smaller tasks for the sake of learning Python.
For now, given a list of strings, I want to remove user-defined title prefixes of any names in the strings. Any string which contains a name will contain only a name, with or without a title prefix. I have the following, and it works, but it just feels unnecessarily complicated. Is there a more Pythonic way to do this? Thanks!
# Return new list without title prefixes for strings in a list of strings.
def strip_titles(line, title_prefixes):
new_csv_line = []
for item in line:
for title_prefix in title_prefixes:
if item.startswith(title_prefix):
new_csv_line.append(item[len(title_prefix)+1:])
break
else:
if title_prefix == title_prefixes[len(title_prefixes)-1]:
new_csv_line.append(item)
else:
continue
return new_csv_line
if __name__ == "__main__":
test_csv_line = ['Mr. Richard Stallman', 'I like cake', 'Mrs. Margaret Thatcher', 'Jean-Claude Van Damme']
test_prefixes = ['Mr.', 'Ms.', 'Mrs.']
print strip_titles(test_csv_line, test_prefixes)
Upvotes: 2
Views: 212
Reputation: 24034
Assuming that prefixes
is variable, perhaps as an aspect of localization, or you prefer not to use a regular expression for some other reason, you could do something like this (untested code):
def strip_title(string, prefixes):
for prefix in prefixes:
if string.startswith(prefix + ' '):
return string[len(prefix) + 1:]
return string
stripped = (list(strip_title(cell, prefixes) for cell in line)
for line in lines)
This is not particularly efficient, since the algorithm ends up doing a lot of redundant checking (e.g. checking three times if the line starts with M
). This sort of thing is a big reason to use regular expressions.
Alternatively, you could dynamically build a regular expression, by escaping each prefix and joining them with |
branches:
def TitleStripper(prefixes):
import re
escaped_titles = (re.escape(prefix) for prefix in prefixes)
prefix_re = re.compile('^({0}) '.format('|'.join(escaped_titles)))
def strip_title(string):
return prefix_re.sub('', string, 1)
return strip_title
The function TitleStripper
creates a closure function strip_title
that works like the previous one but is built for a particular set of prefixes. After you call strip_title = TitleStripper(prefixes)
you can just call strip_title(string)
.
Mostly due to the use of regular expressions, this will be a bit faster than the first method, perhaps at the expense of clarity.
If you really only ever need to check for three prefixes, either of these methods is overkill, and you should just use a static RE as explained in another answer.
Upvotes: 1
Reputation: 30933
A more Pythonic approach would be to replace the "end of list" check with an else:
clause to the for item in line:
loop. The else
gets executed if the for loop completes without being interrupted:
# Return new list without title prefixes for strings in a list of strings.
def strip_titles(line, title_prefixes):
new_csv_line = []
for item in line:
for title_prefix in title_prefixes:
if item.startswith(title_prefix):
new_csv_line.append(item[len(title_prefix)+1:])
break
else:
new_csv_line.append(item)
return new_csv_line
The logic is otherwise the same as yours.
Upvotes: 1
Reputation: 185852
[re.sub(r'^(Mr|Ms|Mrs)\.\s+', '', s) for s in test_csv_line]
Upvotes: 9