Reputation: 387
"I am trying to remove the words starting with lowercase using regular expression but I am not getting the required output."
My input was "apply to this bill and are made a part thereof Illiam B GEISSLER"
import re
text = "apply to this bill and are made a part thereof Illam B GEISSLER"
result = re.sub(r"\w[a-z]", "", text)
print(result)
I got the output as " I B GEISSLER" Required output as " Illiam B GEISSLER"
Upvotes: 2
Views: 1055
Reputation: 1766
Try this,
import re
text = "apply to this bill and are made a part thereof Illam B GEISSLER"
result = re.sub(r"(\b[a-z]+)", '', text).strip()
print(result)
Output
Illam B GEISSLER
Upvotes: 1
Reputation: 27743
This expression might also work:
\s*\b[a-z][a-z]*
import re
regex = r"\s*\b[a-z][a-z]*"
test_str = "apply to this bill and are made a part thereof Illam B GEISSLER apply to this bill and are made a part thereof Illam B GEISSLER"
subst = ""
# You can manually specify the number of replacements by changing the 4th argument
result = re.sub(regex, subst, test_str, 0, re.MULTILINE)
if result:
print (result)
or maybe this one:
([A-Z].*?\b\s*)
import re
regex = r"([A-Z].*?\b\s*)"
test_str = "apply to this bill and are made a part thereof Illam B GEISSLER apply to this bill and are made a part thereof Illam B GEISSLER"
print("".join(re.findall(regex, test_str)))
Illam B GEISSLER Illam B GEISSLER
Upvotes: 1
Reputation: 522064
Try finding the pattern \b[a-z]+\s*
, and replace with empty string:
text = "apply to this bill and are made a part thereof Illam B GEISSLER"
result = re.sub(r'\b[a-z]+\s*', "", text).strip()
print(result)
This prints:
Illam B GEISSLER
The idea behind the pattern \b[a-z]+\s*
is that it matches only entire words surrounded on both sides by word boundaries. Note that we call strip
to remove any remaining whitespace.
One other subtle point is that the pattern removes all whitespace on the RHS of each matching lowercase letter. This is to leave the text readable, should, for example, some matching words lie in between some non matching words:
text = "United States Of a bunch of states called America"
result = re.sub(r'\b[a-z]+\s*', "", text).strip()
print(result)
This correctly prints:
United States Of America
Upvotes: 3
Reputation: 66
You can search the capitalize words in the link you can find an example
Regex - finding capital words in string
Upvotes: 1