Reputation: 9
Is there anyway to compress the text used in this code. I would appreciate the help.
Heya, Is there anyway to compress the text used in this code. I would appreciate the help.
file = open("Test.txt", "r")
Sentence = (file.read())
s = Sentence.split(" ")
ListSentence = []
uniquewords = []
print(Sentence)
for x in s:
if x in uniquewords:
ListSentence.append(uniquewords.index(x))
else:
uniquewords.append(x)
ListSentence.append(uniquewords.index(x))
print(ListSentence)
recreated = ""
for position in ListSentence:
recreated = recreated + uniquewords[position] + " "
print(uniquewords)
print (recreated)
Upvotes: 0
Views: 782
Reputation: 1416
Question is kind of vague... If you mean data compression then you can use binary transforms.
In [1]: import codecs
In [2]: example = 'abcdefg'*100
In [3]: compressed = codecs.encode(example.encode(), 'zlib')
In [4]: compressed
Out[4]: b'x\x9cKLJNIMKO\x1c\xa5F\xa9\xa1F\x01\x00m\x8e\x11\x80'
In [5]: decompressed = codecs.decode(compressed, 'zlib')
In [6]: decompressed
Out[6]: b'abcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefg'
Check out the docs for codecs, at the bottom are the builtin codecs offered for binary transforms.
If however you mean compression to express your desire to reduce lines of code then while the intention of your code is kind of vague, I would imagine you want to filter out repetitive words while possibly retaining the order of words...
Without order:
' '.join(set(sentence.split()))
With order:
seen = set()
words = sentence.split()
new = []
for word in words:
if word not in seen:
seen.add(word)
new.append(word)
unique_ordered = ' '.join(new)
Upvotes: 1
Reputation: 1744
It seems like you're asking if you could reduce the lines of code you have. Here is my attempt:
with open("Test.txt", "r") as file:
Sentence = file.read().split(" ")
ListSentence, uniquewords = [], []
print(Sentence)
for x in s:
if x not in uniquewords:
uniquewords.append(x)
ListSentence.append(uniquewords.index(x)) # you do this every loop anyway
print(ListSentence)
recreated = ""
for position in ListSentence:
recreated += uniquewords[position] + " "
print(uniquewords)
print(recreated)
Upvotes: 0