Reputation: 111
I am new to python. I read data from SQL Server and then write the data into a csv file. The table row has both number, string and datetime values. I tried different ways to write the data. For example,
#method 1
import pandas as pd
df = pd.DataFrame(table, columns=["colummn"])
df.to_csv('list.csv', index=False)*
#method 2
import csv
fl = open('OnlineplayDatabase.csv', 'w')
writer = csv.writer(fl)
for row in table:
writer.writerow(row)
fl.close()
Both methods are normally working. But when some rows contain Chinese characters (see example below), I received an encoding error. The error message says:
codecs.charmap_encode(input,self.errors,encoding_table)[0]
#Error Code
UnicodeEncodeError: 'charmap' codec can't encode character '\u5347' in position 68: character maps to <undefined>
I tried to encode the fields in the row using utf-8. But some of the fields are numbers.
Your help is highly appreciated!
('120.239.9.116 ',
'gyandroid ',
4,
9,
'Dalvik/1.6.0(Linux;U;Android4.4.2;升级版Build/KVT49L) datetime.datetime(2016, 6, 11, 20, 54, 19),
datetime.datetime(2016, 6, 11, 20, 56, 53),
11521.0)
Upvotes: 1
Views: 2866
Reputation: 477
Try this for method #2:
#method 2
import csv
fl = open('OnlineplayDatabase.csv', 'w', encoding='utf8') #set the encoding to utf8
writer = csv.writer(fl)
for row in table:
writer.writerow(row)
fl.close()
Also take a look at this - http://www.pgbovine.net/unicode-python-errors.htm
Upvotes: 3
Reputation: 7967
Look at the error again. This is happening because somewhere in your dataframe there are words that begin with \u
. You need to get rid of that. See if this works. Use the remove_u
function below to get rid of the \u
.
def remove_u(word):
word_u = (word.encode('unicode-escape')).decode("utf-8", "strict")
if r'\u' in word_u:
# print(True)
return word_u.split('\\u')[1]
return word
df.loc[:, 'colummn'] = df['colummn'].apply(func = remove_u)
Once you have updated the dataframe, try writing it out again.
EDIT
I am assuming your column is composed of individual words. If your column has strings instead use the modified version of the remove_u
def remove_u(input_string):
words = input_string.split()
words_u = [(word.encode('unicode-escape')).decode("utf-8", "strict") for word in words]
words_u = [word_u.split('\\u')[1] if r'\u' in word_u else word_u for word_u in words_u]
# print(words_u)
return ' '.join(words_u)
Upvotes: 0