ohblahitsme
ohblahitsme

Reputation: 1072

Comparing the first couple characters in a string

So I have a list of strings:

list1 = ["1thing", "2thing", "3thing", "1thing"]

and I want to find out how many times each one is in the list. The thing is, I only want to compare the first couple of characters because I know that if the first, say 3 characters are the same, then the whole string is the same. I was thinking that I could modify the built in list.count(x) method, or I could override the __eq__ operator but I'm not sure how to do either of those.

Upvotes: 3

Views: 13776

Answers (3)

Marcin
Marcin

Reputation: 49866

Use a generator to extract the first couple of characters, and use the builtin collections.Counter class on that:

Counter(item[:2] for item in list1)

Upvotes: 9

Casey Kuball
Casey Kuball

Reputation: 7965

Probably not as good as a solution as @Marcin's, but using itertools.groupby might make it more readable and flexible.

from itertools import groupby

def group_by_startswith(it, n):
    """Get a dict mapping the first n characters to the number of matches."""

    def first_n(str_):
        return str_[:n]

    startswith_sorted = sorted(it, key=first_n)
    groups = groupby(startswith_sorted, key=first_n)

    return {key: len(list(grouped)) for key, grouped in groups}

Example Output:

>>> list1 = ["1thing", "2thing", "3thing", "1thing"]
>>> print(group_by_startswith(list1, 3))
{'3th': 1, '2th': 1, '1th': 2}

This solution allows you a little more flexibility with the result. For example, modifying the return line to return grouped or list(grouped) allows you to easily get the matching objects.

Upvotes: 1

cobie
cobie

Reputation: 7281

why go through all the hastle..use the collections.Counter module to find frequencies.

>>> import collections
>>> x=['1thing', '2thing', '1thing', '3thing']
>>> y=collections.Counter(x)
>>> y
Counter({'1thing': 2, '2thing': 1, '3thing': 1})

Upvotes: 5

Related Questions