Argon
Argon

Reputation: 435

Regex - finding capital words in string

I'm trying to learn how to use regular expressions but have a question. Let's say I have the string

line = 'Cow Apple think Woof`

I want to see if line has at least two words that begin with capital letters (which, of course, it does). In Python, I tried to do the following

import re
test = re.search(r'(\b[A-Z]([a-z])*\b){2,}',line)
print(bool(test))

but that prints False. If I instead do

test = re.search(r'(\b[A-Z]([a-z])*\b)',line)

I find that print(test.group(1)) is Cow but print(test.group(2)) is w, the last letter of the first match (there are no other elements in test.group).

Any suggestions on pinpointing this issue and/or how to approach the problem better in general?

Upvotes: 13

Views: 24527

Answers (3)

Saurabh Vaichal
Saurabh Vaichal

Reputation: 11

import re

sent = "His email is [email protected], however his wife uses [email protected]"

x = re.findall('[A-Za-z]+@[A-Za-z\.]+', sent)

print(x)

If there is a period at the end of an email ID (abc@some,com.), it will be returned at the end of the email address. However, this can be dealt separately.

Upvotes: 1

davidhu
davidhu

Reputation: 10472

I use the findall function to find all instances that match the regex. The use len to see how many matches there are, in this case, it prints out 3. You can check if the length is greater than 2 and return a True or False.

import re

line = 'Cow Apple think Woof'

test = re.findall(r'(\b[A-Z]([a-z])*\b)',line)
print(len(test) >= 2)

If you want to use only regex, you can search for a capitalized word then some characters in between and another capitalized word.

test = re.search(r'(\b[A-Z][a-z]*\b)(.*)(\b[A-Z][a-z]*\b)',line)
print(bool(test))
  • (\b[A-Z][a-z]*\b) - finds a capitalized word
  • (.*) - matches 0 or more characters
  • (\b[A-Z][a-z]*\b) - finds the second capitalized word

This method isn't as dynamical since it will not work for trying to match 3 capitalized word.

Upvotes: 1

Synedraacus
Synedraacus

Reputation: 1055

The last letter of the match is in group because of inner parentheses. Just drop those and you'll be fine.

>>> t = re.findall('([A-Z][a-z]+)', line)
>>> t
['Cow', 'Apple', 'Woof']
>>> t = re.findall('([A-Z]([a-z])+)', line)
>>> t
[('Cow', 'w'), ('Apple', 'e'), ('Woof', 'f')]

The count of capitalised words is, of course, len(t).

Upvotes: 10

Related Questions