Reputation: 2355
What is best pure Python implementation to check if a string contains ANY letters from the alphabet?
string_1 = "(555).555-5555"
string_2 = "(555) 555 - 5555 ext. 5555
Where string_1
would return False
for having no letters of the alphabet in it and string_2
would return True
for having letter.
Upvotes: 127
Views: 350097
Reputation: 149
A simple and multilingual solution to the question of the OP can be achieved with the Alphabetic library, which can be installed via: pip install alphabetic
.
After installing, import the library, create a WritingSystem
instance and select the alphabet of the desired language (Alphabetic covers a wide range of languages):
from alphabetic import WritingSystem
ws = WritingSystem()
# Retrieve the English alphabet list
en_alphabet = ws.by_language(ws.Language.English, as_list=True)
Using this list, we can write a handy function that accomplishes this task:
def contains_alphabet_strings(string: str, language_alphabet: list[str]) -> bool:
alphabet_chars = [char for char in string if char in language_alphabet]
return True if len(alphabet_chars) > 0 else False
and call it as follows:
string_1 = "(555).555-5555"
string_2 = "(555) 555 - 5555 ext. 5555"
# Result:
contains_alphabet_strings(string_1, en_alphabet) # False
contains_alphabet_strings(string_2, en_alphabet) # True
Upvotes: 0
Reputation: 340
I tested each of the above methods for finding if any alphabets are contained in a given string and found out average processing time per string on a standard computer.
~250 ns for
import re
~3 µs for
re.search('[a-zA-Z]', string)
~6 µs for
any(c.isalpha() for c in string)
~850 ns for
string.upper().isupper()
Opposite to as alleged, importing re takes negligible time, and searching with re takes just about half time as compared to iterating isalpha() even for a relatively small string.
Hence for larger strings and greater counts, re would be significantly more efficient.
But converting string to a case and checking case (i.e. any of upper().isupper() or lower().islower() ) wins here. In every loop it is significantly faster than re.search() and it doesn't even require any additional imports.
Upvotes: 13
Reputation: 33407
Regex should be a fast approach:
re.search('[a-zA-Z]', the_string)
Upvotes: 167
Reputation: 403
I liked the answer provided by @jean-françois-fabre, but it is incomplete.
His approach will work, but only if the text contains purely lower- or uppercase letters:
>>> text = "(555).555-5555 extA. 5555"
>>> text.islower()
False
>>> text.isupper()
False
The better approach is to first upper- or lowercase your string and then check.
>>> string1 = "(555).555-5555 extA. 5555"
>>> string2 = '555 (234) - 123.32 21'
>>> string1.upper().isupper()
True
>>> string2.upper().isupper()
False
Upvotes: 20
Reputation: 140316
You can use islower()
on your string to see if it contains some lowercase letters (amongst other characters). or
it with isupper()
to also check if contains some uppercase letters:
below: letters in the string: test yields true
>>> z = "(555) 555 - 5555 ext. 5555"
>>> z.isupper() or z.islower()
True
below: no letters in the string: test yields false.
>>> z= "(555).555-5555"
>>> z.isupper() or z.islower()
False
>>>
Not to be mixed up with isalpha()
which returns True
only if all characters are letters, which isn't what you want.
Note that Barm's answer completes mine nicely, since mine doesn't handle the mixed case well.
Upvotes: 29
Reputation: 651
You can also do this in addition
import re
string='24234ww'
val = re.search('[a-zA-Z]+',string)
val[0].isalpha() # returns True if the variable is an alphabet
print(val[0]) # this will print the first instance of the matching value
Also note that if variable val returns None. That means the search did not find a match
Upvotes: 1
Reputation: 12486
You can use regular expression like this:
import re
print re.search('[a-zA-Z]+',string)
Upvotes: 11
Reputation: 353604
How about:
>>> string_1 = "(555).555-5555"
>>> string_2 = "(555) 555 - 5555 ext. 5555"
>>> any(c.isalpha() for c in string_1)
False
>>> any(c.isalpha() for c in string_2)
True
Upvotes: 116