Reputation: 33
Everything else seems to work just fine, but last character is always off by 1. For example, if I input abcccddd, I get a1b1c3d2 but I should get a1b1c3d3. Any hint would be much appreciated!
Prompt: String Compression: Implement a method to perform basic string compression using the counts of repeated characters. For example, the string aabcccccaaa would become a2blc5a3. If the "compressed" string would not become smaller than the original string, your method should return the original string. You can assume the string has only uppercase and lowercase letters (a - z). Do the easy thing first. Compress the string, then compare the lengths. Be careful that you aren't repeatedly concatenating strings together, this can be very inefficient.
def compression(string):
hash = {}
list = []
count = 0
for i in range(len(string) - 1):
if string[i - 1] != string[i] or i == 0:
if string[i] != string[i + 1] or i == len(string) - 2:
count = count + 1
list.append(str(string[i]))
list.append(str(count))
count = 0
elif string[i] == string[i + 1]:
count = count + 1
elif string[i - 1] == string[i]:
if string[i] != string[i + 1] or i == len(string) - 2:
count = count + 1
list.append(str(string[i]))
list.append(str(count))
count = 0
if string[i] == string[i + 1]:
count = count + 1
print(list)
result = "".join(list)
if len(result) == len(string):
return string
else:
return result
string = "abcccfffgggg"
compression(string)
Upvotes: 1
Views: 3135
Reputation: 1
Python function to perform string compression. For example, "aabcccccaaa" would become "a2b1c5a3".
def string_compression(s):
result = ""
if not s:
return result
char_count = 1 # Initialize character count to 1
for i in range(1, len(s)):
if s[i] == s[i - 1]:
char_count += 1
else:
result += s[i - 1] + str(char_count)
char_count = 1
result += s[-1] + str(char_count)
return result
print(string_compression('aabcccccaaa'))
Upvotes: 0
Reputation: 163362
You could use a pattern with a backreference ([a-z])\1
matching the repeating characters, and assemble the final string with counts using the length of the matches.
Then you can compare the length of the original string and the assembled string.
Example code
import re
strings = [
"abcccddd",
"aabcccccaaa",
"abcd",
"aabbccddeeffffffffffffff",
"a"
]
def compression(s):
res = ''.join([x.group(1) + str(len(x.group())) for x in re.finditer(r"([a-z])\1*", s, re.I)])
return res if len(s) >= len(res) else s
for s in strings:
print(compression(s))
Output
a1b1c3d3
a2b1c5a3
abcd
a2b2c2d2e2f14
a
Upvotes: -1
Reputation: 4980
If you are up to the itertools
module - try groupby
:
s = 'bbbbaacddd' # dddeeef gg'
groups = [(label, len(list(group)))
for label, group in groupby(s) if label] #
compressed = "".join("{}{}".format(label, count) for label, count in groups)
print(compressed) # b4a2c1d3
Another way to achieve it, is to use more_itertools.run_length
.
>>> compressed = list(run_length.encode(s))
>>> compressed
[('b', 4), ('a', 2), ('c', 1), ('d', 3)]
>>> ''.join("{}{}".format(label, count) for label, count in compressed)
'b4a2c1d3'
Upvotes: 3
Reputation: 1473
You can make this easier by using a dictionary and deleting the characters whenever you use them, which counts the number of characters you want to compress
string = "aabccccaaaa"
output = ""
lastchar = string[0]
counts = {lastchar:1}
for i in range(1, len(string)):
s = string[i]
if s == lastchar:
counts[s] += 1
else:
output += f"{lastchar}{counts[lastchar]}" if counts[lastchar] > 1 else lastchar
del counts[lastchar]
counts[s] = 1
lastchar = s
print(output+f"{lastchar}{counts[lastchar]}" if counts[lastchar] > 1 else lastchar)
Upvotes: 1