Bebu
Bebu

Reputation: 9

How to compress text file

Is there anyway to compress the text used in this code. I would appreciate the help.
Heya, Is there anyway to compress the text used in this code. I would appreciate the help.

 file = open("Test.txt", "r")

 Sentence = (file.read())

 s = Sentence.split(" ")

 ListSentence = []
 uniquewords = []
 print(Sentence)
 for x in s:
     if x in uniquewords:
         ListSentence.append(uniquewords.index(x))
     else:
         uniquewords.append(x)
         ListSentence.append(uniquewords.index(x))
 print(ListSentence)

 recreated = ""
 for position in ListSentence:
    recreated = recreated + uniquewords[position] + " "
 print(uniquewords)
 print (recreated)

Upvotes: 0

Views: 782

Answers (2)

Verbal_Kint
Verbal_Kint

Reputation: 1416

Question is kind of vague... If you mean data compression then you can use binary transforms.

In [1]: import codecs

In [2]: example = 'abcdefg'*100

In [3]: compressed = codecs.encode(example.encode(), 'zlib')

In [4]: compressed
Out[4]: b'x\x9cKLJNIMKO\x1c\xa5F\xa9\xa1F\x01\x00m\x8e\x11\x80'

In [5]: decompressed = codecs.decode(compressed, 'zlib')

In [6]: decompressed
Out[6]: b'abcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefgabcdefg'

Check out the docs for codecs, at the bottom are the builtin codecs offered for binary transforms.

If however you mean compression to express your desire to reduce lines of code then while the intention of your code is kind of vague, I would imagine you want to filter out repetitive words while possibly retaining the order of words...

Without order:

' '.join(set(sentence.split()))

With order:

seen = set()
words = sentence.split()
new = []
for word in words:
    if word not in seen:
        seen.add(word)
        new.append(word)
unique_ordered = ' '.join(new)

Upvotes: 1

AetherUnbound
AetherUnbound

Reputation: 1744

It seems like you're asking if you could reduce the lines of code you have. Here is my attempt:

 with open("Test.txt", "r") as file:
     Sentence = file.read().split(" ")
 ListSentence, uniquewords = [], []
 print(Sentence)
 for x in s:
     if x not in uniquewords:
         uniquewords.append(x)
     ListSentence.append(uniquewords.index(x)) # you do this every loop anyway
 print(ListSentence)

 recreated = ""
 for position in ListSentence:
    recreated += uniquewords[position] + " "
 print(uniquewords)
 print(recreated)

Upvotes: 0

Related Questions