iGwok
iGwok

Reputation: 333

Finding letters in string, not followed by a number... possibly using RE?

I am trying to extract letters from a string, which do not follow, or are not followed directly by a number.

Here's an example string:

string = "ts0060_LOD-70234_lr2_billboards_rgba_over_s3d_lf_v5_2Kdciufa_lnh"

This is what I have so far:

re.findall(r"[a-z]+", string.lower())

which gives this result:

['ts', 'lod', 'lr', 'billboards', 'rgba', 'over', 's', 'd', 'lf', 'v', 'kdciufa', 'lnh']

... but the result I am looking for is something more like this:

['lod', 'billboards', 'rgba', 'over', 'lf', 'lnh']

Is there a way of achieving this using regular expressions?

Many thanks,

Upvotes: 4

Views: 4505

Answers (2)

Kevin
Kevin

Reputation: 76194

An alternative to using findall is to split the string into individual words, and then filter out any words containing non-alphabetical characters.

import re

string = "ts0060_LOD-70234_lr2_billboards_rgba_over_s3d_lf_v5_2Kdciufa_lnh"

#split on non-alphanumeric characters
words = re.split("[^a-z0-9]", string.lower())
print "words:", words

filtered_words = filter(str.isalpha, words)
print "filtered words:", filtered_words

Result:

words: ['ts0060', 'lod', '70234', 'lr2', 'billboards', 'rgba', 'over', 's3d', 'lf', 'v5', '2kdciufa', 'lnh']
filtered words: ['lod', 'billboards', 'rgba', 'over', 'lf', 'lnh']

Upvotes: 2

Martijn Pieters
Martijn Pieters

Reputation: 1122142

Use negative look-arounds:

re.findall(r"(?<![\da-z])[a-z]+(?![\da-z])", string.lower())

This matches lower-case letters that are not immediately preceded or followed by more letters or digits.

Demo:

>>> import re
>>> string = "ts0060_LOD-70234_lr2_billboards_rgba_over_s3d_lf_v5_2Kdciufa_lnh"
>>> re.findall(r"(?<![\da-z])[a-z]+(?![\da-z])", string.lower())
['lod', 'billboards', 'rgba', 'over', 'lf', 'lnh']

Upvotes: 8

Related Questions