Palomar
Palomar

Reputation: 33

Python - combine regex patterns

I have a large text and the aim is to select all 10-character strings for which the first character is a letter and the last character is a digit.

I am a python rookie and what I managed to achieve is to find all 10-character strings:

ten_char = re.findall(r"\D(\w{10})\D", pdfdoc)

Question is how can I put together my other conditions: apart from a 10-character string, I am looking for one where the first character is a letter and the last character is a digit.

Suggestions appreciated!

Upvotes: 3

Views: 442

Answers (4)

Palomar
Palomar

Reputation: 33

thank you very much for a great discussion and interesting suggestions. Very first post on stack overflow, but wow...what a community you are!

In fact, using:

r'\b([a-zA-Z]\S{8}\d)'

solved my problem very nicely. Really appreciated all your comments.

Upvotes: 0

dawg
dawg

Reputation: 103864

If I understand it, do:

r'\b([a-zA-Z]\S{8}\d)\b'

Demo

Python demo:

>>> import re
>>> txt="""\
... Should match:
... a123456789 aA34567s89 zzzzzzzer9
... 
... Not match:
... 1123456789 aA34567s8a zzzzzzer9 zzzxzzzze99"""
>>> re.findall(r'\b([a-zA-Z]\S{8}\d)\b', txt)
['a123456789', 'aA34567s89', 'zzzzzzzer9']

Upvotes: 1

online Thomas
online Thomas

Reputation: 9381

([a-z].{8}[0-9])

Will ask for 1 alphabetical char, 8 other character and finally 1 number.

JS Demo

var re = /([a-z].{8}[0-9])/gi; 
var str = 'Aasdf23423423423423423b423423423423423';
var m;
 
while ((m = re.exec(str)) !== null) {
    if (m.index === re.lastIndex) {
        re.lastIndex++;
    }
     console.log(m[0]);
}

https://regex101.com/r/gI8jZ4/1

Upvotes: 2

Ben
Ben

Reputation: 6348

I wouldn't use regex for this. Regular string manipulation is more clear in my opinion (though I haven't tested the following code).

def get_useful_words(filename):
    with open(filename, 'r') as file:
        for line in file:
            for word in line.split():
                if len(word) == 10 and word[0].isalpha() and word[-1].isdigit():
                    yield word


for useful_word in get_useful_words('tmp.txt'):
    print(useful_word)

Upvotes: 0

Related Questions