Reputation: 71

From Nested Dictionary to CSV File

I have nested dictionary (with length > 70.000):

users_item = {
    "sessionId1": {
        "12345645647": 1.0, 
        "9798654": 5.0 

    },         
    "sessionId2":{
        "3445657657": 1.0

    },
    "sessionId3": {
        "87967976": 5.0, 
        "35325626436": 1.0, 
        "126789435": 1.0, 
        "72139856": 5.0      
    },
    "sessionId4": {
        "4582317": 1.0         
    }
......
}

I want create CSV file from my nested dictionary, my result will look like:

sessionId1 item rating
sessionId1 item rating
sessionId2 item rating
sessionId3 item rating
sessionId3 item rating
sessionId3 item rating
sessionId3 item rating
.......

I found this post: Convert Nested Dictionary to CSV Table

It's similar to my question but it's not working when I try all answers, pandas library run out of memory

How I can make CSV file with my data?

Upvotes: 5

Answers (4)

martineau

Reputation: 123541

If you iteratively write the file, there should be no memory issues:

import csv

users_item = {
    "sessionId1": {
        "12345645647": 1.0,
        "9798654": 5.0

    },
    "sessionId2":{
        "3445657657": 1.0

    },
    "sessionId3": {
        "87967976": 5.0,
        "35325626436": 1.0,
        "126789435": 1.0,
        "72139856": 5.0
    },
    "sessionId4": {
        "4582317": 1.0
    }
}

with open('nested_dict.csv', 'w') as output:
    writer = csv.writer(output, delimiter='\t')
    for sessionId in sorted(users_item):
        ratings = users_item[sessionId]
        for item in ratings:
            writer.writerow([sessionId, item, ratings[item]])

Resulting contents of output file (where » represents a tab characters):

sessionId1»  12345645647»  1.0
sessionId1»  9798654»      5.0
sessionId2»  3445657657»   1.0
sessionId3»  126789435»    1.0
sessionId3»  87967976»     5.0
sessionId3»  35325626436»  1.0
sessionId3»  72139856»     5.0
sessionId4»  4582317»      1.0

Upvotes: 0

mowcow

Reputation: 81

Just loop through the dictionary and use the Python csv writer to write to the csv file.

with open('output.csv', 'w') as csv_file:
    csvwriter = csv.writer(csv_file, delimiter='\t')
    for session in users_item:
        for item in users_item[session]:
            csvwriter.writerow([session, item, users_item[session][item]])

Upvotes: 1

user5547025

Reputation:

for session, ratings in users_item.items():
    for rating, value in ratings.items():
        print("{} {}".format(session, value))

Output:

sessionId3 5.0
sessionId3 1.0
sessionId3 5.0
sessionId3 1.0
sessionId1 5.0
sessionId1 1.0
sessionId4 1.0
sessionId2 1.0

Note that a dict (user_items) has no order. So unless you specify the order of rows using some other way, the ouput will be in the order the dict uses internally.

Edit: This approach has no problems with a file containing 70k entries.

Edit: If you want to write to a CSV file, use the csv module or just pipe the output to a file.

Upvotes: 1

Autonomy

Reputation: 21

Assuming you want each session as a row, the number of columns for every row will be the total number of unique keys in all session dicts. Based on the data you've given, I'm guessing the number of unique keys are astronomical.

That is why you're running into memory issues with the solution given in this discussion. It's simply too much data to hold in memory at one time.

Your only option if my assumptions are correct are to divide and conquer. Break the data into smaller chunks and write them to a file in csv format. Then merge the csv files at the end.

Upvotes: 0

From Nested Dictionary to CSV File

Answers (4)

Related Questions