Gupta
Gupta

Reputation: 314

Transform JSON to excel table

I have data in csv - 2 columns, 1st column contains member id and second contains characteristics in Key-Value pairs (nested one under another).

enter image description here

I have seen online codes which convert a simple Key-value pairs but not able to transform data like what i have shown above

I want to transform this data into a excel table as below enter image description here

Upvotes: 1

Views: 2987

Answers (1)

Kushan Gunasekera
Kushan Gunasekera

Reputation: 8546

I did it with this XlsxWriter package, so first you have to install it by running pip install XlsxWriter command.

import csv  # to read csv file
import xlsxwriter  # to write xlxs file
import ast

# you can change this names according to your local ones
csv_file = 'data.csv'
xlsx_file = 'data.xlsx'

# read the csv file and get all the JSON values into data list
data = []
with open(csv_file, 'r') as csvFile:
    # read line by line in csv file
    reader = csv.reader(csvFile)

    # convert every line into list and select the JSON values
    for row in list(reader)[1:]:
        # csv are comma separated, so combine all the necessary
        # part of the json with comma
        json_to_str = ','.join(row[1:])

        # convert it to python dictionary
        str_to_dict = ast.literal_eval(json_to_str)

        # append those completed JSON into the data list
        data.append(str_to_dict)

# define the excel file
workbook = xlsxwriter.Workbook(xlsx_file)

# create a sheet for our work
worksheet = workbook.add_worksheet()

# cell format for merge fields with bold and align center
# letters and design border
merge_format = workbook.add_format({
    'bold': 1,
    'border': 1,
    'align': 'center',
    'valign': 'vcenter'})

# other cell format to design the border
cell_format = workbook.add_format({
    'border': 1,
})

# create the header section dynamically
first_col = 0
last_col = 0
for index, value in enumerate(data[0].items()):
    if isinstance(value[1], dict):
        # this if mean the JSON key has something else
        # other than the single value like dict or list
        last_col += len(value[1].keys())
        worksheet.merge_range(first_row=0,
                              first_col=first_col,
                              last_row=0,
                              last_col=last_col,
                              data=value[0],
                              cell_format=merge_format)
        for k, v in value[1].items():
            # this is for go in deep the value if exist
            worksheet.write(1, first_col, k, merge_format)
            first_col += 1
        first_col = last_col + 1
    else:
        # 'age' has only one value, so this else section
        # is for create normal headers like 'age'
        worksheet.write(1, first_col, value[0], merge_format)
        first_col += 1

# now we know how many columns exist in the
# excel, and set the width to 20
worksheet.set_column(first_col=0, last_col=last_col, width=20)

# filling values to excel file
for index, value in enumerate(data):
    last_col = 0
    for k, v in value.items():
        if isinstance(v, dict):
            # this is for handle values with dictionary
            for k1, v1 in v.items():
                if isinstance(v1, list):
                    # this will capture last 'type' list (['Grass', 'Hardball'])
                    # in the 'conditions'
                    worksheet.write(index + 2, last_col, ', '.join(v1), cell_format)
                else:
                    # just filling other values other than list
                    worksheet.write(index + 2, last_col, v1, cell_format)
                last_col += 1
        else:
            # this is handle single value other than dict or list
            worksheet.write(index + 2, last_col, v, cell_format)
            last_col += 1

# finally close to create the excel file
workbook.close()

I commented out most of the line to get better understand and reduce the complexity because you are very new to Python. If you didn't get any point let me know, I'll explain as much as I can. Additionally I used enumerate() python Built-in Function. Check this small example which I directly get it from original documentation. This enumerate() is useful when numbering items in the list.

Return an enumerate object. iterable must be a sequence, an iterator, or some other object which supports iteration. The __next__() method of the iterator returned by enumerate() returns a tuple containing a count (from start which defaults to 0) and the values obtained from iterating over iterable.

>>> seasons = ['Spring', 'Summer', 'Fall', 'Winter']
>>> list(enumerate(seasons))
[(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')]
>>> list(enumerate(seasons, start=1))
[(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')]

Here is my csv file,

enter image description here

and here is the final output of the excel file. I just merged the duplicate header values (matchruns and conditions).

enter image description here

Upvotes: 3

Related Questions