Reputation: 1057
I'm coding a twitter Event Detector and at the end of the detection I want to save the results on a .txt file. As the tweets are written in Spanish, some words of the result may contain accent marks, and I'm having some problems trying to save them as they should.
def save_event(self, event, event_counter, output_file):
saving_data = {}
saving_data['_id'] = event_counter
saving_data['main_words'] = event[2].split(', ')
saving_data['related_words'] = [None] * len(event[3])
related_words_loop = 0
for related_word, weight in event[3]: # En la posicion 3 del array event se guarda la lista de palabras relacionadas
word_json = {}
word_json['word'] = related_word
formatted_weight = float("{0:.2f}".format(weight)) # Formateamos el peso a un decimal de dos digitos
word_json['weight'] = formatted_weight
saving_data['related_words'][related_words_loop] = word_json
related_words_loop += 1
saving_json = json.dumps(saving_data)
with open(output_file, 'a', encoding='utf-8') as f:
f.write(saving_json)
f.write('\n')
def save_events(self, output_file):
try:
os.remove(output_file)
except OSError:
pass
event_counter = 0
for event in self.events:
event_counter += 1
self.save_event(event, event_counter, output_file)
I'm specifing the codification I want for the file on with open(output_file, 'a', encoding='utf-8') as f:
and reading some other related questions this should work, but when I check the created file there are words which are saved like \u00e9ranse
when they should appear like éranse
.
Any idea if I'm missing anything?
Upvotes: 1
Views: 1467
Reputation: 7221
Problem is how you generate your json string:
saving_json = json.dumps(saving_data)
the default will have ensure_ascii=True
. Set it to False will keep your non-ASCII characters as-is. See https://docs.python.org/3/library/json.html#json.dump
Upvotes: 5