Eliot
Eliot

Reputation: 37

Issue Python writing - reading CSV of a dictionary

I have an issue with a dictionary in Python (last version). Here is my dictionary : [ {dict1} , {dict2} , ... ] All the dict are similar to :

{'Date': '2016-10-17',
  'Message_body': '   Version française  BUSINESS EVENTS - SPRING 2016 April 5: YESS   EVENT ON SCALING UP Robin Bonsey, Hystra Consultant, will discuss business solutions to the predicament of small holder farmer',
  'Sender': '[email protected]',
  'Subject': 'Fwd: Inclusive business events - spring 2016'}

.

According to Python, the 'type' of each value (type(dict1['Message_body'])) is "str" . My issue is to convert this dictionary of dictionaries in a CSV file (with the keys 'Date' , 'Message_body' , 'Sender', 'Subject' ). Here is my code :

def export_dict_list_to_csv(data, filename):
    with open(filename, 'w',encoding='utf-8',newline='') as f:
        # Assuming that all dictionaries in the list have the same keys.
        headers = sorted([k for k, v in data[0].items()])
        csv_data = [headers]

        for d in data:
            csv_data.append([d[h] for h in headers])

        writer = csv.writer(f)
        writer.writerows(csv_data)


export_dict_list_to_csv(final_list, 'chili.csv')

It work pretty well, but the typo is strange. For example, in the .csv i have "Chaque moi voudrait être le tyran de tous les autres » dit Pascal dans les Pensées" instead of "Chaque moi voudrait être le tyran de tous les autres à dit Pascal dans les Pensées". In the "str" form, i have the "good typo" but in the .csv it's not the good typo (i don't know why). This issue isn't really important if the "reading" of the CSV file restores the good initial typo of the "str".

But I don't succeed to read properly the CSV created... I tried :

with open('chili.csv', 'r') as csvfile:
     spamreader = csv.reader(csvfile, delimiter=',')
     for row in spamreader:
         print (row)

and i get the error "UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1087: ordinal not in range(128)"

and I tried :

with open('/Users/Marco/HandB/Gmail/chili.csv', 'rb') as csvfile:
     spamreader = csv.reader(csvfile, delimiter=',')
     for row in spamreader:
         print (row)

Error: iterator should return strings, not bytes (did you open the file in text mode?)

So I have 2 questions : 1) Is the way I write the CSV file a good way ? Why do i have a strange typo on CSV ?

2) How can I read the CSV created before ? I searched several hours on internet but i didn't find anything in particular to help me with this issue. In particular, i don't know well all the issues surrounding the "encoding" problems, i just know that my values inside the dict are in str type and I think they are in UTF-8 format . Here is the code : (I clean the "data" received from the GMAIL API)

mssg_parts = payld['parts'] # fetching the message parts
part_one  = mssg_parts[0] # fetching first element of the part 
part_body = part_one['body'] # fetching body of the message
part_data = part_body['data'] # fetching data from the body
clean_one = part_data.replace("-","+") # decoding from Base64 to UTF-8
clean_one = clean_one.replace("_","/") # decoding from Base64 to UTF-8
clean_two = base64.b64decode (bytes(clean_one, 'UTF-8')) # decoding from Base64 to UTF-8
soup = BeautifulSoup(clean_two , "lxml" )
soup = BeautifulSoup(clean_two, "html")
soup.get_text()                      
mssg_body = soup.body()              
# mssg_body is a readible form of message body
# depending on the end user's requirements, it can be further cleaned 
# using regex, beautiful soup, or any other method
temp_dict['Message_body'] = mssg_body

I wrote down the code that provide me the "Message_body" part because it may help you to understand the format of the message and its conversion to CSV file.

Thanks a lot in advance ! :)

Upvotes: 0

Views: 83

Answers (1)

cs95
cs95

Reputation: 402872

It seems you're on python3. You'll want to open the file in text mode, not binary mode. Furthermore, if your data has some special characters, set the encoding when calling open to open the file for reading. This can be done with encoding=...:

with open('/Users/Marco/HandB/Gmail/chili.csv', 'r', encoding='utf-8') as csvfile:
    reader = csv.reader(csvfile)
    ...

If you want to read in your csv as a dictionary, you should probably consider taking a look at csv.DictReader The docs have some handy examples to get you started.

Upvotes: 1

Related Questions