Shyam
Shyam

Reputation: 387

How to remove words starting with lowercase from a sentence using regex

"I am trying to remove the words starting with lowercase using regular expression but I am not getting the required output."

My input was "apply to this bill and are made a part thereof Illiam B GEISSLER"

import re 
text = "apply to this bill and are made a part thereof Illam B GEISSLER"  
result = re.sub(r"\w[a-z]", "", text)  
print(result) 

I got the output as " I B GEISSLER" Required output as " Illiam B GEISSLER"

Upvotes: 2

Views: 1055

Answers (4)

Arun Augustine
Arun Augustine

Reputation: 1766

Try this,

import re
text = "apply to this bill and are made a part thereof Illam B GEISSLER"
result = re.sub(r"(\b[a-z]+)", '', text).strip()
print(result)

Output

Illam B GEISSLER

Upvotes: 1

Emma
Emma

Reputation: 27743

This expression might also work:

\s*\b[a-z][a-z]*

Demo 1

Test

import re

regex = r"\s*\b[a-z][a-z]*"

test_str = "apply to this bill and are made a part thereof Illam B GEISSLER apply to this bill and are made a part thereof Illam B GEISSLER"

subst = ""

# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)

if result:
    print (result)

or maybe this one:

([A-Z].*?\b\s*)

Test

import re

regex = r"([A-Z].*?\b\s*)"
test_str = "apply to this bill and are made a part thereof Illam B GEISSLER apply to this bill and are made a part thereof Illam B GEISSLER"
print("".join(re.findall(regex, test_str)))

Output

Illam B GEISSLER Illam B GEISSLER

Demo 2

Upvotes: 1

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522064

Try finding the pattern \b[a-z]+\s*, and replace with empty string:

text = "apply to this bill and are made a part thereof Illam B GEISSLER"  
result = re.sub(r'\b[a-z]+\s*', "", text).strip()
print(result)

This prints:

Illam B GEISSLER

The idea behind the pattern \b[a-z]+\s* is that it matches only entire words surrounded on both sides by word boundaries. Note that we call strip to remove any remaining whitespace.

One other subtle point is that the pattern removes all whitespace on the RHS of each matching lowercase letter. This is to leave the text readable, should, for example, some matching words lie in between some non matching words:

text = "United States Of a bunch of states called America"  
result = re.sub(r'\b[a-z]+\s*', "", text).strip()
print(result)

This correctly prints:

United States Of America

Upvotes: 3

Andres Espinosa
Andres Espinosa

Reputation: 66

You can search the capitalize words in the link you can find an example

Regex - finding capital words in string

Upvotes: 1

Related Questions