Reputation: 43
For example, if the given string is this:
"aaabbbbccdaeeee"
I want to say something like:
3 a, 4 b, 2 c, 1 d, 1 a, 4 e
It is easy enough to do in Python with a brute force loop, but I am wondering if there is a more Pythonic / cleaner one-liner type of approach.
My brute force:
while source!="":
leading = source[0]
c=0
while source!="" and source[0]==leading:
c+=1
source=source[1:]
print(c, leading)
Upvotes: 4
Views: 58
Reputation: 1537
There are a number of different ways to solve the problem. @dawg has already posted the best solution, but if for some reason you aren't allowed to use Counter()
(maybe a job interview or school assignment) then you can actually solve the problem in a few ways.
from collections import Counter, defaultdict
def counter_counts(s):
""" Preferred method using Counter()
Arguments:
s {string} -- [string to have each character counted]
Returns:
[dict] -- [dictionary of counts of each char]
"""
return Counter(s)
def default_counts(s):
""" Alternative solution using defaultdict
Arguments:
s {string} -- [string to have each character counted]
Returns:
[dict] -- [dictionary of counts of each char]
"""
counts = defaultdict(int) # each key is initalized to 0
for char in s:
counts[char] += 1 # increment the count of each character by 1
return counts
def vanilla_counts_1(s):
""" Alternative solution using a vanilla dicitonary
Arguments:
s {string} -- [string to have each character counted]
Returns:
[dict] -- [dictionary of counts of each char]
"""
counts = {}
for char in s:
# we have to manually check that each value is in the dictionary before attempting to increment it
if char in counts:
counts[char] += 1
else:
counts[char] = 1
return counts
def vanilla_counts_2(s):
""" Alternative solution using a vanilla dicitonary
This version uses the .get() method to increment instead of checking if a key already exists
Arguments:
s {string} -- [string to have each character counted]
Returns:
[dict] -- [dictionary of counts of each char]
"""
counts = {}
for char in s:
# the second argument in .get() is the default value if we dont find the key
counts[char] = counts.get(char, 0) + 1
return counts
And just for fun lets take a look at how each method performs.
For s = "aaabbbbccdaeeee"
and 10,000 runs:
Counter: 0.0330204963684082s
defaultdict: 0.01565241813659668s
vanilla 1: 0.01562952995300293s
vanilla 2: 0.015581130981445312s
(actually rather surprising results)
Now let's test what happens if we set our string to the entire plaintext version of the book of Genesis and 1,000 runs:
Counter: 8.500739336013794s
defaultdict: 14.721554040908813s
vanilla 1: 18.089043855667114s
vanilla 2: 27.01840090751648s
Looks like the overhead of creating the Counter()
object becomes much less important!
(These weren't very scientific tests, but it was a bit of fun).
Upvotes: 0
Reputation: 104014
Use a Counter for a count of each distinct letter in the string regardless of position:
>>> s="aaabbbbccdaeeee"
>>> from collections import Counter
>>> Counter(s)
Counter({'a': 4, 'b': 4, 'e': 4, 'c': 2, 'd': 1})
You can use groupby if the position in the string has meaning:
from itertools import groupby
li=[]
for k, l in groupby(s):
li.append((k, len(list(l))))
print li
Prints:
[('a', 3), ('b', 4), ('c', 2), ('d', 1), ('a', 1), ('e', 4)]
Which can be reduce to a list comprehension:
[(k,len(list(l))) for k, l in groupby(s)]
You can even use a regex:
>>> [(m.group(0)[0], len(m.group(0))) for m in re.finditer(r'((\w)\2*)', s)]
[('a', 3), ('b', 4), ('c', 2), ('d', 1), ('a', 1), ('e', 4)]
Upvotes: 9