Reputation: 534
I have a CSV file like this: (userId, movieId, score) and ordered by userId
user1,movie1,0.1
user1,movie2,0.2
user2,movie2,0.4
user2,movie1,0.2
I want to group them in an array of dictionaries like this:
[
{
"userId":"user1",
"scores":[
{
"movieId":"movie1",
"score":0.1
},
{
"movieId":"movie2",
"score":0.2
},
]
},
{
"userId":"user2",
"scores":[
{
"movieId":"movie2",
"score":0.4
},
{
"movieId":"movie1",
"score":0.2
}
]
}
]
This is my try using python, but it does'nt work
def get_body(batch):
result = []
record = {}
scores = []
for row in batch:
if 'userId' in record and record['userId'] != row[0]:
result.append({'userId': record['userId'], 'scores': scores})
record = {}
scores = []
if 'userId' not in record:
record['userId'] = row[0]
scores.append({'movieId': row[1], 'score': float(row[2])})
return result
Also, I'm not using pandas as an alternative, I'll appreciate your help
Upvotes: 2
Views: 43
Reputation: 195428
Using only builtin csv
module:
import csv
import json
out = {}
with open("your_file.csv", "r") as f_in:
reader = csv.reader(f_in)
for row in reader:
out.setdefault(row[0], []).append(
{"movieId": row[1], "score": float(row[2])}
)
out = [{"userId": k, "scores": v} for k, v in out.items()]
# pretty print:
print(json.dumps(out, indent=4))
Prints:
[
{
"userId": "user1",
"scores": [
{
"movieId": "movie1",
"score": 0.1
},
{
"movieId": "movie2",
"score": 0.2
}
]
},
{
"userId": "user2",
"scores": [
{
"movieId": "movie2",
"score": 0.4
},
{
"movieId": "movie1",
"score": 0.2
}
]
}
]
Upvotes: 2