Reputation: 2158

Add character and remove the last comma in a JSON file

I am trying to create a JSON file through a CSV. Below code creates the data however not quite where I want it to be. I have some experience in python. From my understanding the JSON file should be written like this [{},{},...,{}].

How do I?:

I am able to insert the ',', however how do I remove the last ','?
How do I insert '[' at the very beginning and ']' at the very end? I tried inserting it into outputfile.write('['...etc), it shows up too many places.
Not include header on the first line of json file.

Names.csv:

id,team_name,team_members
123,Biology,"Ali Smith, Jon Doe"
234,Math,Jane Smith 
345,Statistics ,"Matt P, Albert Shaw"
456,Chemistry,"Andrew M, Matt Shaw, Ali Smith"
678,Physics,"Joe Doe, Jane Smith, Ali Smith "

Code:

import csv
import json
import os

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    for line in infile:
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         json.dump(row,outfile)
         outfile.write("," + "\n" )

Output so far:

{"id": "id", "team_name": "team_name", "team_members": ["team_members\n"]},
{"id": "123", "team_name": "Biology", "team_members": ["\"Ali Smith", " Jon Doe\"\n"]},
{"id": "234", "team_name": "Math", "team_members": ["Jane Smith \n"]},
{"id": "345", "team_name": "Statistics ", "team_members": ["\"Matt P", " Albert Shaw\"\n"]},
{"id": "456", "team_name": "Chemistry", "team_members": ["\"Andrew M", " Matt Shaw", " Ali Smith\"\n"]},
{"id": "678", "team_name": "Physics", "team_members": ["\"Joe Doe", " Jane Smith", " Ali Smith \""]},

Upvotes: 0

Answers (4)

martineau

Reputation: 123473

Seems like it would be a lot easier to use the csv.DictReader class instead of reinventing the wheel:

import csv
import json

data = []
with open('names.csv', 'r', newline='') as infile:
    for row in csv.DictReader(infile):
        data.append(row)

with open('names1.json','w') as outfile:
    json.dump(data, outfile, indent=4)

Contents of names1.json file folloing execution (I used indent=4 just to make it more human readable):

[
    {
        "id": "123",
        "team_name": "Biology",
        "team_members": "Ali Smith, Jon Doe"
    },
    {
        "id": "234",
        "team_name": "Math",
        "team_members": "Jane Smith"
    },
    {
        "id": "345",
        "team_name": "Statistics ",
        "team_members": "Matt P, Albert Shaw"
    },
    {
        "id": "456",
        "team_name": "Chemistry",
        "team_members": "Andrew M, Matt Shaw, Ali Smith"
    },
    {
        "id": "678",
        "team_name": "Physics",
        "team_members": "Joe Doe, Jane Smith, Ali Smith"
    }
]

Upvotes: 0

abarnert

Reputation: 365767

First, how do you skip the header? That's easy:

next(infile) # skip the first line
for line in infile:

However, you may want to consider using a csv.DictReader for input. It handles reading the header line, and using the information there to create a dict for each row, and splitting the rows for you (as well as handling cases you may not have thought of, like quoted or escaped text that can be present in CSV files):

for row in csv.DictReader(infile):
    jsondump(row,outfile)

Now onto the harder problem.

A better solution would probably be to use an iterative JSON library that can dump an iterator as a JSON array. Then you could do something like this:

def rows(infile):
    for line in infile:
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         yield row

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    genjson.dump(rows(infile), outfile)

The stdlib json.JSONEncoder has an example in the docs that does exactly this—although not very efficiently, because it first consumes the entire iterator to build a list, then dumps that:

class GenJSONEncoder(json.JSONEncoder):
    def default(self, o):
       try:
           iterable = iter(o)
       except TypeError:
           pass
       else:
           return list(iterable)
       # Let the base class default method raise the TypeError
       return json.JSONEncoder.default(self, o)

j = GenJSONEncoder()
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    outfile.write(j.encode(rows(infile)))

And really, if you're willing to build a whole list rather than encode line by line, it may be simpler to just do the listifying explicitly:

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    json.dump(list(rows(infile)))

You can go further by also overriding the iterencode method, but this will be a lot less trivial, and you'd probably want to look for an efficient, well-tested streaming iterative JSON library on PyPI instead of building it yourself from the json module.

But, meanwhile, here's a direct solution to your question, changing as little as possible from your existing code:

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    # print the opening [
    outfile.write('[\n')
    # keep track of the index, just to distinguish line 0 from the rest
    for i, line in enumerate(infile):
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         # add the ,\n _before_ each row except the first
         if i:
             outfile.write(',\n')
         json.dump(row,outfile)
    # write the final ]
    outfile.write('\n]')

This trick—treating the first element special rather than the last—simplifies a lot of problems of this type.

Another way to simplify things is to actual iterate over adjacent pairs of lines, using a minor variation on the pairwise example in the itertools docs:

def pairwise(iterable):
    a, b = itertools.tee(iterable)
    next(b, None)
    return itertools.zip_longest(a, b, fillvalue=None)

with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    # print the opening [
    outfile.write('[\n')
    # iterate pairs of lines
    for line, nextline in pairwise(infile):
         row = dict()
         # print(row)
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         json.dump(row,outfile)
         # add the , if there is a next line
         if nextline is not None:
             outfile.write(',')
         outfile.write('\n')
    # write the final ]
    outfile.write(']')

This is just as efficient as the previous version, and conceptually simpler—but a lot more abstract.

Upvotes: 3

BallpointBen

Reputation: 13770

Pandas can handle this with ease:

df = pd.read_csv('names.csv', dtype=str)
df['team_members'] = (df['team_members']
                      .map(lambda s: s.split(','))
                      .map(lambda l: [x.strip() for x in l]))
records = df.to_dict('records')
json.dump(records, outfile)

Upvotes: 0

sjw

Reputation: 6543

With a minimal edit to your code, you can create a list of dictionaries in Python and dump it to file as JSON all at once (assuming your dataset is small enough to fit in memory):

import csv
import json
import os

rows = []  # Create list
with open('names.csv', 'r') as infile, open('names1.json','w') as outfile:
    for line in infile:
         row = dict()
         id, team_name, *team_members = line.split(',')
         row["id"] = id;
         row["team_name"] = team_name;
         row["team_members"] = team_members
         rows.append(row)  # Append row to list

    json.dump(rows[1:], outfile)  # Write entire list to file (except first row)

As an aside, you should not use id as a variable name in Python as it is a built-in function.

Upvotes: 0

Add character and remove the last comma in a JSON file

Answers (4)

Related Questions