lak
lak

Reputation: 534

Group csv data in a array of dictionaries - Python

I have a CSV file like this: (userId, movieId, score) and ordered by userId

user1,movie1,0.1
user1,movie2,0.2
user2,movie2,0.4
user2,movie1,0.2

I want to group them in an array of dictionaries like this:

[
   {
      "userId":"user1",
      "scores":[
         {
            "movieId":"movie1",
            "score":0.1
         },
         {
            "movieId":"movie2",
            "score":0.2
         },
         
      ]
   },
   {
      "userId":"user2",
      "scores":[
         {
            "movieId":"movie2",
            "score":0.4
         },
         {
            "movieId":"movie1",
            "score":0.2
         }
      ]
   }
]

This is my try using python, but it does'nt work

def get_body(batch):
    
    result = []
    record = {}
    scores = []
   
    for row in batch:
        if 'userId' in record and record['userId'] != row[0]:
            result.append({'userId': record['userId'], 'scores': scores})
            record = {}
            scores = []
        
        if 'userId' not in record:
            record['userId'] = row[0]

        scores.append({'movieId': row[1], 'score': float(row[2])})
        
    return result

Also, I'm not using pandas as an alternative, I'll appreciate your help

Upvotes: 2

Views: 43

Answers (1)

Andrej Kesely
Andrej Kesely

Reputation: 195428

Using only builtin csv module:

import csv
import json

out = {}
with open("your_file.csv", "r") as f_in:
    reader = csv.reader(f_in)
    for row in reader:
        out.setdefault(row[0], []).append(
            {"movieId": row[1], "score": float(row[2])}
        )

out = [{"userId": k, "scores": v} for k, v in out.items()]
# pretty print:
print(json.dumps(out, indent=4))

Prints:

[
    {
        "userId": "user1",
        "scores": [
            {
                "movieId": "movie1",
                "score": 0.1
            },
            {
                "movieId": "movie2",
                "score": 0.2
            }
        ]
    },
    {
        "userId": "user2",
        "scores": [
            {
                "movieId": "movie2",
                "score": 0.4
            },
            {
                "movieId": "movie1",
                "score": 0.2
            }
        ]
    }
]

Upvotes: 2

Related Questions