PythonSherpa
PythonSherpa

Reputation: 2600

Python 3.6 - Read encoded text from file and convert to string

Hopefully someone can help me out with the following. It is probably not too complicated but I haven't been able to figure it out. My "output.txt" file is created with:

f = open('output.txt', 'w')
print(tweet['text'].encode('utf-8'))
print(tweet['created_at'][0:19].encode('utf-8'))
print(tweet['user']['name'].encode('utf-8')) 
f.close()

If I don't encode it for writing to file, it will give me errors. So "output" contains 3 rows of utf-8 encoded output:

b'testtesttest'
b'line2test'
b'\xca\x83\xc9\x94n ke\xc9\xaan'

In "main.py", I am trying to convert this back to a string:

f = open("output.txt", "r", encoding="utf-8")
text = f.read()
print(text)
f.close()

Unfortunately, the b'' - format is still not removed. Do I still need to decode it? If possible, I would like to keep the 3 row structure. My apologies for the newbie question, this is my first one on SO :)

Thank you so much in advance!

Upvotes: 4

Views: 11301

Answers (3)

PythonSherpa
PythonSherpa

Reputation: 2600

With the help of the people answering my question, I have been able to get it to work. The solution is to change the way how to write to file:

     tweet = json.loads(data)
     tweet_text = tweet['text'] #  content of the tweet
     tweet_created_at = tweet['created_at'][0:19] #  tweet created at
     tweet_user = tweet['user']['name']  # tweet created by
     with open('output.txt', 'w', encoding='utf-8') as f:
           f.write(tweet_text + '\n')
           f.write(tweet_created_at+ '\n')
           f.write(tweet_user+ '\n')

Then read it like:

    f = open("output.txt", "r", encoding='utf-8')
    tweettext = f.read()
    print(text)
    f.close()

Upvotes: 3

Kruupös
Kruupös

Reputation: 5474

If b and the quote ' are in your file, that means this in a problem with your file. Someone probably did write(print(line)) instead of write(line). Now to decode it, you can use literal_eval. Otherwise @m_callens answer's should be ok.

import ast

with open("b.txt", "r") as f:
    text = [ast.literal_eval(line) for line in f]

for l in text: 
    print(l.decode('utf-8'))

# testtesttest
# line2test
# ʃɔn keɪn

Upvotes: 0

m_callens
m_callens

Reputation: 6360

Instead of specifying the encoding when opening the file, use it to decode as you read.

f = open("output.txt", "rb")
text = f.read().decode(encoding="utf-8")
print(text)
f.close()

Upvotes: 1

Related Questions