user8998457
user8998457

Reputation: 61

Structure a list of lists as columns and rows for csv/Excel

I have this python3 code with a list of lists that I need to structure in a certain way for Excel. Here is what I have:

>>> pprint(avrop_categories[0:3])

[['Name1', ['MainCategory1', 'SubCategory1', ('Item1', 1.0)]],

['Name2',
['MainCategory2', 'Subcategory2', ('Item2', 1.0)],
['MainCategory2', 'Subcategory3', ('Item3', 1.0)],
['MainCategory2', 'Subcategory4', ('Item4', 1.0)],
['MainCategory3', 'Subcategory5', ('Item5', 1.0)],
['MainCategory4', 'Subcategory6', ('Item6', 1.0)],
['MainCategory5', 'Subcategory7', ('Item7', 1.0)],
['MainCategory5', 'Subcategory8', ('Item8', 1.0)],
['MainCategory5', 'Subcategory9', ('Item9', 1.0)]],

['Name3',
['MainCategory3', 'Subcategory10', ('Item10', 1.0)], #<-note MainCategory3 again.
['MainCategory6', 'Subcategory11', ('Item11', 1.0)]]]

I need to get to this. I.e. Names into rows, MainCategories into columns and Items counted into their MainCategories. Subcategories can be discarded.

enter image description here

How on earth do I do it? I've used Python3 for a short time and I know how to save the above as a csv file ready for Excel, but I don't know how to structure it in the way I want. If I use lists to put all MainCategories into one list, and then item counts into another list, how do I maintain the relationship between the MainCategories and the item counts? Am I supposed to do something else? Maybe use dict? Any help would be appreciated as I have no idea what to do.

Upvotes: 1

Views: 117

Answers (1)

jpp
jpp

Reputation: 164773

Here is one solution via collections.defaultdict and pandas library, given a list lst:

import pandas as pd
from collections import defaultdict

d = defaultdict(lambda: defaultdict(int))

for i in lst:
    for j in i[1:]:
        d[i[0]][j[0]] += 1

df = pd.DataFrame(index=sorted(d.keys()),
                  columns=sorted(set.union(*map(set, d.values()))))

for k, v in d.items():
    for w in v:
        df.loc[k, w] = d[k][w]

df = df.fillna(0).rename_axis('Name').reset_index()

You can then easily send this to Excel:

df.to_excel('file.xlsx', index=False)

Result:

    Name  MainCategory1  MainCategory2  MainCategory3  MainCategory4  \
0  Name1              1              0              0              0   
1  Name2              0              3              1              1   
2  Name3              0              0              1              0   

   MainCategory5  MainCategory6  
0              0              0  
1              3              0  
2              0              1  

Upvotes: 1

Related Questions