Reputation: 1935
write a python program to write data in .csv file,but find that every item in the .csv has a "b'" before the content, and there are blank line, I do not know how to remove the blank lines; and some item in the .csv file are unrecognizable characters,such as "b'\xe7\xbe\x85\xe5\xb0\x91\xe5\x90\x9b'", because some data are in Chinese and Japanese, so I think maybe something wrong when writing these data in the .csv file.Please help me to solve the problem the program is:
#write data in .csv file
def data_save_csv(type,data,id_name,header,since = None):
#get the date when storage data
date_storage()
#create the data storage directory
csv_parent_directory = os.path.join("dataset","csv",type,glovar.date)
directory_create(csv_parent_directory)
#write data in .csv
if type == "group_members":
csv_file_prefix = "gm"
if since:
csv_file_name = csv_file_prefix + "_" + since.strftime("%Y%m%d-%H%M%S") + "_" + time_storage() + id_name + ".csv"
else:
csv_file_name = csv_file_prefix + "_" + time_storage() + "_" + id_name + ".csv"
csv_file_directory = os.path.join(csv_parent_directory,csv_file_name)
with open(csv_file_directory,'w') as csvfile:
writer = csv.writer(csvfile,delimiter=',',quotechar='"',quoting=csv.QUOTE_MINIMAL)
#csv header
writer.writerow(header)
row = []
for i in range(len(data)):
for k in data[i].keys():
row.append(str(data[i][k]).encode("utf-8"))
writer.writerow(row)
row = []
Upvotes: 0
Views: 3517
Reputation: 77407
You have a couple of problems. The funky "b" thing happens because csv will cast data to a string before adding it to a column. When you did str(data[i][k]).encode("utf-8")
, you got a bytes
object and its string representation is b"..."
and its filled with utf-8 encoded data. You should handle encoding when you open the file. In python 3, open
opens a file with the encoding from sys.getdefaultencoding()
but its a good idea to be explicit about what you want to write.
Next, there's nothing that says that two dicts will enumerate their keys in the same order. The csv.DictWriter
class is built to pull data from dictionaries, so use it instead. In my example I assumed that header
has the names of the keys you want. It could be that header
is different, and in that case, you'll also need to pass in the actual dict key names you want.
Finally, you can just strip out empty dicts while you are writing the rows.
with open(csv_file_directory,'w', newline='', encoding='utf-8') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=header, delimiter=',',
quotechar='"',quoting=csv.QUOTE_MINIMAL)
writer.writeheader()
writer.writerows(d for d in data if d)
Upvotes: 1
Reputation: 158
It sounds like at least some of your issues have to do with incorrect unicode.
try implementing the snippet below into your existing code. As the comment say, the first part takes your input and converts it into utf-8.
The second bit will return your output in the expected format of ascii.
import codecs
import unicodedata
f = codecs.open('path/to/textfile.txt', mode='r',encoding='utf-8') #Take input and turn into unicode
for line in f.readlines():
line = unicodedata.normalize('NFKD', line).encode('ascii', 'ignore'). #Ensure output is in ASCII
Upvotes: 0