How to calculate the proportions of the words that appear with the first letter capitalized

Question

By using Python, I would like to calculate the proportions of the words that appear with the first letter capitalized. For example, here is an example word list,

word_list = ["capital", "Capital", "Capital", "Capital", "capital", "bus", "Bus", "bus", "Bus", "white"]

and I would like to produce the result like below:

{"Capital": 0.6, "Bus": 0.5, "White": 0}

Do you have any ideas on this? It seems easy but is hard to come up with good solutions. To be specific, it's easy to count the numbers of first-letter-capitalized words by using defaultdict.

word_dict = defaultdict(int)
for word in word_list:
    if word[0].isupper():
        word_dict[word] += 1

Thank you in advance!

jpp · Accepted Answer

Words sorted: `itertools.groupby`

Assuming, as in your example, your strings are sorted, you can use groupby with statistics.mean:

from itertools import groupby
from statistics import mean

grouper = groupby(word_list, key=str.casefold)
res = {k.capitalize(): mean(x[0].isupper() for x in words) for k, words in grouper}

# {'Bus': 0.5, 'Capital': 0.6, 'White': 0}

Words not necessarily sorted: `sorted` + `groupby`

You can, in this case, sort before applying the above logic:

word_list = sorted(word_list, key=str.casefold)

This adds complexity to the algorithm if your list isn't sorted.

Words not necessarily sorted: `collections.defaultdict`

An alternative is to construct a dictionary with lists of Boolean values viacollections.defaultdict, then use statistics.mean:

from collections import defaultdict
from statistics import mean

dd = defaultdict(list)
for word in word_list:
    dd[word.capitalize()].append(word[0].isupper())

# defaultdict(list,
#             {'Bus': [False, True, False, True],
#              'Capital': [False, True, True, True, False],
#              'White': [False]})

res = {k: mean(v) for k, v in dd.items()}

# {'Bus': 0.5, 'Capital': 0.6, 'White': 0}

How to calculate the proportions of the words that appear with the first letter capitalized

Answers (2)

Words sorted: `itertools.groupby`

Words not necessarily sorted: `sorted` + `groupby`

Words not necessarily sorted: `collections.defaultdict`

Related Questions

How to calculate the proportions of the words that appear with the first letter capitalized

Answers (2)

Words sorted: itertools.groupby

Words not necessarily sorted: sorted + groupby

Words not necessarily sorted: collections.defaultdict

Related Questions

Words sorted: `itertools.groupby`

Words not necessarily sorted: `sorted` + `groupby`

Words not necessarily sorted: `collections.defaultdict`