Reputation: 4008
I have a text, and I want to know if all or a percent bigger than 50% is in uppercase.
DOFLAMINGO WITH TOUCH SCREEN lorem ipsum
I try to use regex(found here a solution):
rx = re.compile(r"^([A-Z ':]+$)", re.M)
upp = rx.findall(string)
But this finds all caps, i don't know if all or more than 50 percent(this includes all) is uppercase ?
I want to number only letters (so no numbers,spaces, new lines etc)
Upvotes: 3
Views: 1764
Reputation: 51623
Generic solution that works with any boolean function and iterable (see below for version that only looks at str.isalpha()
):
def percentage(data, boolfunc):
"""Returns how many % of the 'data' returns 'True' for the given boolfunc."""
return (sum(1 for x in data if boolfunc(x)) / len(data))*100
text = "DOFLAMINGO WITH TOUCH SCREEN lorem ipsum"
print( percentage( text, str.isupper ))
print( percentage( text, str.islower ))
print( percentage( text, str.isdigit ))
print( percentage( text, lambda x: x == " " ))
Output:
62.5 # isupper
25.0 # islower
0.0 # isdigit
12.5 # lambda for spaces
even better is schwobaseggl's
return sum(map(boolfunc,data)) / len(data)*100
because it does not need to persist a list but instead uses a generator.
Edit: 2nd version that only uses str.isalpha characters and allows multiple boolfuncs:
def percentage2(data, *boolfuncs):
"""Returns how many % of the 'data' returns 'True' for all given boolfuncs.
Only uses str.isalpha() characters and ignores all others."""
count = sum(1 for c in data if c.isalpha())
return sum(1 for x in data if all(f(x) for f in boolfuncs)) / count * 100
text = "DOFLAMINGO WITH TOUCH SCREEN lorem ipsum"
print( percentage2( text, str.isupper, str.isalpha ))
print( percentage2( text, str.islower, str.isalpha ))
Output:
71.42857142857143
28.57142857142857
Upvotes: 2
Reputation: 164613
Regex seems overkill here. You can use sum
with a generator expression:
x = 'DOFLAMINGO WITH TOUCH SCREEN lorem ipsum'
x_chars = ''.join(x.split()) # remove all whitespace
x_upper = sum(i.isupper() for i in x_chars) > (len(x_chars) / 2)
Or functionally via map
:
x_upper = sum(map(str.isupper, x_chars)) > (len(x_chars) / 2)
Alternatively, via statistics.mean
:
from statistics import mean
x_upper = mean(i.isupper() for i in s if not i.isspace()) > 0.5
Upvotes: 5
Reputation: 366
Using regular expressions, this is one way you can do it (given that s
is the string in question):
upper = re.findall(r'[A-Z]', s)
lower = re.findall(r'[a-z]', s)
percentage = ( len(upper) / (len(upper) + len(lower)) ) * 100
It finds the lista of both uppercase and lowercase characters and gets the percentage using their lengths.
Upvotes: 1
Reputation: 73450
You can use filter
and str.isalpha
to clean out non-alphabetic chars and str.isupper
to count uppercase chars and calculate the ratio:
s = 'DOFLAMINGO WITH TOUCH SCREEN lorem ipsum'
alph = list(filter(str.isalpha, s)) # ['D', ..., 'O', 'W', ..., 'N', 'l', 'o', ...]
sum(map(str.isupper, alph)) / len(alph)
# 0.7142857142857143
Also see the docs on sum
and map
which you might find yourself using regularly. Moreover, this uses the fact that bool
is a subclass of int
and is cast appropriately for the summation which might be too implicit for the taste of some.
Upvotes: 7
Reputation: 2221
Try this, it's short and does the job:
text = "DOFLAMINGO WITH TOUCH SCREEN lorem ipsum"
print("Percent in Capital Letters:", sum(1 for c in text if c.isupper())/len(text)*100)
# Percent in Capital Letters: 62.5
Upvotes: 0
Reputation: 420
Something like the following should work.
string = 'DOFLAMINGO WITH TOUCH SCREEN lorem ipsum'
rx = re.sub('[^A-Z]', '', string)
print(len(rx)/len(string))
Upvotes: 0
Reputation: 14216
Here is one way to do it:
f = sum(map(lambda c: c.isupper(), f)) / len(f)
(sum(map(lambda c: c.isupper(), f)) / len(f)) > .50
Upvotes: 0