hemmy
hemmy

Reputation: 157

Python Regular expression to end of line

Bit frustrated that I can't work this out but I want to define a regular expression that selects an unknown amount of words (some spaced, some incorporating numbers, some underscored).

When I say 'unknown amount of words' I'm happy to limit it to 10, if that's more realistic. Basically I'm scanning file names and don't suspect there are any longer than 10 words, but it would be nice not to have to set a limit.

The best I have so far is:

tc = re.findall(r'FROM CLIP NAME:\s\s(\w*\s*\w*\s*\w*\s*\w*\s*\w*\s*\w*\s*\w*\s*\w*\s*\w*\s*\w*)', text)

Where 'FROM CLIP NAME:\s\s' will be at the beginning of each line.

My attempt above is a complete fail as \s reads line breaks as well as spaces and so also grabs data from the next line.

Upvotes: 1

Views: 2195

Answers (2)

poke
poke

Reputation: 387557

FROM CLIP NAME:\s{2}([\w\s]*)$

You can use a character class to define the allowed characters (which may also be predefined character classes, like \w and \s) and accept any number of that. That way you won’t really care for what it will contain. You can also just use a dot . to match literally anything.

The trailing $ will make the regular expression require the end of the line at the end. Note that for a line-based behaviour for $ you need to use the re.M flag for your regular expression, otherwise the $ will match the end of the string.

re.compile('FROM CLIP NAME:\s{2}([\w\s]*)$', re.M)

If in your case FROM CLIP NAME: is a static prefix, then you shouldn’t use regular expressions. Just iterate on the lines and strip off the prefix as eumiro showed.

Upvotes: 3

eumiro
eumiro

Reputation: 212835

How about not using regular expressions?

Check, whether a line starts with "FROM CLIP NAME: " and then cut this beginning off and return the rest of the string:

title = "FROM CLIP NAME:  "
for line in lines:
    if line.startswith(title):
        tc = line[len(title):]

This iterates over lines and therefore line goes always only until the newline.

If you don't have a list of lines (or a file object), but a text instead, use for line in text.splitline().

Upvotes: 2

Related Questions