Reputation: 71
I have nested dictionary (with length > 70.000):
users_item = {
"sessionId1": {
"12345645647": 1.0,
"9798654": 5.0
},
"sessionId2":{
"3445657657": 1.0
},
"sessionId3": {
"87967976": 5.0,
"35325626436": 1.0,
"126789435": 1.0,
"72139856": 5.0
},
"sessionId4": {
"4582317": 1.0
}
......
}
I want create CSV file from my nested dictionary, my result will look like:
sessionId1 item rating
sessionId1 item rating
sessionId2 item rating
sessionId3 item rating
sessionId3 item rating
sessionId3 item rating
sessionId3 item rating
.......
I found this post: Convert Nested Dictionary to CSV Table
It's similar to my question but it's not working when I try all answers, pandas library run out of memory
How I can make CSV file with my data?
Upvotes: 5
Views: 10754
Reputation: 123541
If you iteratively write the file, there should be no memory issues:
import csv
users_item = {
"sessionId1": {
"12345645647": 1.0,
"9798654": 5.0
},
"sessionId2":{
"3445657657": 1.0
},
"sessionId3": {
"87967976": 5.0,
"35325626436": 1.0,
"126789435": 1.0,
"72139856": 5.0
},
"sessionId4": {
"4582317": 1.0
}
}
with open('nested_dict.csv', 'w') as output:
writer = csv.writer(output, delimiter='\t')
for sessionId in sorted(users_item):
ratings = users_item[sessionId]
for item in ratings:
writer.writerow([sessionId, item, ratings[item]])
Resulting contents of output file (where »
represents a tab characters):
sessionId1» 12345645647» 1.0
sessionId1» 9798654» 5.0
sessionId2» 3445657657» 1.0
sessionId3» 126789435» 1.0
sessionId3» 87967976» 5.0
sessionId3» 35325626436» 1.0
sessionId3» 72139856» 5.0
sessionId4» 4582317» 1.0
Upvotes: 0
Reputation: 81
Just loop through the dictionary and use the Python csv writer to write to the csv file.
with open('output.csv', 'w') as csv_file:
csvwriter = csv.writer(csv_file, delimiter='\t')
for session in users_item:
for item in users_item[session]:
csvwriter.writerow([session, item, users_item[session][item]])
Upvotes: 1
Reputation:
for session, ratings in users_item.items():
for rating, value in ratings.items():
print("{} {}".format(session, value))
Output:
sessionId3 5.0
sessionId3 1.0
sessionId3 5.0
sessionId3 1.0
sessionId1 5.0
sessionId1 1.0
sessionId4 1.0
sessionId2 1.0
Note that a dict
(user_items
) has no order. So unless you specify the order of rows using some other way, the ouput will be in the order the dict
uses internally.
Edit: This approach has no problems with a file containing 70k entries.
Edit: If you want to write to a CSV file, use the csv
module or just pipe the output to a file.
Upvotes: 1
Reputation: 21
Assuming you want each session as a row, the number of columns for every row will be the total number of unique keys in all session dicts. Based on the data you've given, I'm guessing the number of unique keys are astronomical.
That is why you're running into memory issues with the solution given in this discussion. It's simply too much data to hold in memory at one time.
Your only option if my assumptions are correct are to divide and conquer. Break the data into smaller chunks and write them to a file in csv format. Then merge the csv files at the end.
Upvotes: 0