Ravi
Ravi

Reputation: 11

Python Regular Expressions - Accepting Sentences alone containing only Letters

This is a module in one part of my task however I'm trying to adapt the regular expression string to accept sentences rather than only single words with letters only. I'm fairly new to python programming and have been learning it for my GCSE for about a year and would appreciate some help.

validateLoop = True
while validateLoop: #While loop used to loop back around if choice is invalid
    shift = input("Enter a sentence: ").lower() #Takes input for a sentence
    if not (re.match('[a-z]+$', shift)): #This is where i'm stuck
        print("Invalid input message, only include letters a-z with no other characters")
        print("Any upper case letters will be converted into lowercase")
    else:
        validateLoop = False

Upvotes: 1

Views: 1334

Answers (4)

ivan_pozdeev
ivan_pozdeev

Reputation: 36026

Rework your regex in the following steps. Refer to Python regex syntax and maybe regular expression basics for the building blocks you have at your disposal.

(not all items may apply to you depending on your task specifics and elaboration)

  1. Apart from letters, each character can be a space; there can be capital letters as well (for simplicity, allow them at any position, too - there are all sorts of abbreviations out there, anyway);

  2. The 1st letter shall be capital, the last character is a full stop, question mark or exclamation mark;

  3. There can be punctuation marks at the end of each word, i.e. after a letter and before a space; they, however, cannot be any of sentence-ending marks;

  4. ellipsis can be either a \u2026 (…) or three consecutive dots. It can be both at the end of a word and sentence (I'm not sure if they add an extra dot in English);

  5. a "dotted" abbreviation is one or more runs of letters followed by single dots, without any spaces. Also note that they don't add an extra dot if this is at the end of the sentence. At this point, you probably need to store this construct in a separate variable and insert it into relevant parts of the main expression;

  6. Direct speech or dialogue punctuation is probably not part of your task. But if it is, you need to use a subexpression to detect such a construct as a whole.

Upvotes: 0

mathdan
mathdan

Reputation: 191

Perhaps this would work better?

if not (re.match('^[a-z0-9`\'\",/;:\(\)\[\]\$\&\s]+[\.\?!]$', shift)):

It guarantees that the "sentence" ends with a period, question mark, or exclamation.

This could be made a little smarter, since it will miss some valid sentences, such as when there is a quote at the end of the sentence (e.g. Cassius said, "The fault, dear Brutus, is not in our stars, but in ourselves, that we are underlings."), but I think it covers your needs.

Upvotes: 1

Hard Tacos
Hard Tacos

Reputation: 370

if not (re.match(r"^[A-Za-z]*$", shift)):

This will find everything from the start of the sentence ^

Square brackets will find ONLY the characters that are within them [ ]

A-Za-z specifies which characters to find

* is everything

and $ matches the end of the string

Edit:

if you want to include spaces, use \s

if not (re.match(r"^[A-Za-z\s]*$", shift)):

Upvotes: 1

Shailen Tuli
Shailen Tuli

Reputation: 14171

How about:

regexp = re.compile(r'^[a-zA-Z\s!-~]+$')
regexp.match(shift)

The !-~ bit should get you the punctuation. the \s gets you the space.

The r prefix makes the string a raw string. Try to use raw strings in your regular expressions.

Upvotes: 2

Related Questions