RedRaven
RedRaven

Reputation: 735

create a cross tab (maybe/sort of, not sure what its called) with Python

What I am trying to achieve is a csv file that looks like this:

,h.r9,h.r8,h.r5,h.r4,h.r7,h.r6,h.r1,h.r3,h.r2,h.r10
chris,0,1,0,0,0,0,0,1,0,0
bob,0,0,1,1,1,1,0,0,0,1
dick,1,0,1,0,0,0,0,0,1,1
ab,1,2,0,0,0,0,0,0,0,0
emma,0,0,1,0,1,0,1,0,1,0

somewhat unrelated: does anyone know what the actual name for a matrix like this is? That would at least make googling this issue easier.

The data is originally in a dictionary like this:

dict = {}
dict['ab'] = ["h.r1","h.r2"]
dict['bob'] = ["h.r3","h.r4","h.r5","h.r6","h.r10"]
dict['chris'] = ["h.r2","h.r8"]
dict['dick'] = ["h.r1","h.r10","h.r9","h.r3"]
dict['emma'] = ["h.r3","h.r5","h.r7","h.r9"]

So, if a person has the variable in the dictionary there should be a 1 in the CSV otherwise there should be a zero.

The code below builds the csv file, but with all zeros. How would I implement the logic of ones and zeros? I can do this in PHP/HTML, but being very new to Python I'm not not catching on... Also is there a library that may already have this function built in? (numpy perhaps?)

import csv

dict = {}
dict['ab'] = ["h.r1","h.r2"]
dict['bob'] = ["h.r3","h.r4","h.r5","h.r6","h.r10"]
dict['chris'] = ["h.r2","h.r8"]
dict['dick'] = ["h.r1","h.r10","h.r9","h.r3"]
dict['emma'] = ["h.r3","h.r5","h.r7","h.r9"]

mergedlist = []
for k,v in dict.items():
        mergedlist = mergedlist + v
mergedlist = list(set(mergedlist))
listlen = len(mergedlist)
mergedlist[0:0] = " "

thekeys = []
for k in dict.keys():
        thekeys.append(k)

thecsv = csv.writer(open("your.csv", 'w', newline=''))
thecsv.writerow(mergedlist)

for col in thekeys:
        thecsv.writerow([col] + ['0']*listlen) #does the logic go in here somehow??

Upvotes: 1

Views: 321

Answers (3)

CT Zhu
CT Zhu

Reputation: 54380

First you don't want to name you data dict as it overrides dictionary, change it to something like dict1:

#first generate the data body:
>>> ar=[[dict1[jtem].count(item) for item in ['h.r%s' %i for i in range(1, 11)]] for jtem in dict1] #are your values only in h.s1 to h.s10?
>>> ar
[[0, 1, 0, 0, 0, 0, 0, 1, 0, 0], \
[1, 0, 1, 0, 0, 0, 0, 0, 1, 1], \
[0, 0, 1, 1, 1, 1, 0, 0, 0, 1], \
[1, 1, 0, 0, 0, 0, 0, 0, 0, 0], \
[0, 0, 1, 0, 1, 0, 1, 0, 1, 0]]
#Then insert the names:
>>> junk=[item.insert(0, jtem) for item, jtem in zip(ar, dict1.keys())]
>>> ar
[['chris', 0, 1, 0, 0, 0, 0, 0, 1, 0, 0], \
['dick', 1, 0, 1, 0, 0, 0, 0, 0, 1, 1], \
['bob', 0, 0, 1, 1, 1, 1, 0, 0, 0, 1], \
['ab', 1, 1, 0, 0, 0, 0, 0, 0, 0, 0], \
['emma', 0, 0, 1, 0, 1, 0, 1, 0, 1, 0]]
#Then insert the title:
>>> ar.insert(0, ['',]+['h.r%s' %i for i in range(1, 11)])
>>> ar
[['', 'h.r1', 'h.r2', 'h.r3', 'h.r4', 'h.r5', 'h.r6', 'h.r7', 'h.r8', 'h.r9', 'h.r10'], \
['chris', 0, 1, 0, 0, 0, 0, 0, 1, 0, 0], \
['dick', 1, 0, 1, 0, 0, 0, 0, 0, 1, 1], \
['bob', 0, 0, 1, 1, 1, 1, 0, 0, 0, 1], \
['ab', 1, 1, 0, 0, 0, 0, 0, 0, 0, 0], \
['emma', 0, 0, 1, 0, 1, 0, 1, 0, 1, 0]]

Then write it to a CSV file using your favorite method.

Upvotes: 1

Daniel Roseman
Daniel Roseman

Reputation: 599836

I have no idea what most of that code is doing. It can be done in just a couple of lines:

all_values = set()
for person_values in data.values():
    all_values = all_values.union(person_values)
all_values_list = list(all_values)

thecsv = csv.writer(open("your.csv", 'w'))
thecsv.writerow(all_values_list)
for name, values in data.items():
    row = [1 if i in values else 0 for i in all_values_list]
    thecsv.writerow([name] + row)

Note I've renamed your dictionary to data, as it's not a good idea to overwrite built-in function/class names.

Upvotes: 3

Ofir Israel
Ofir Israel

Reputation: 3913

I would do it completely different but using your code I'd change the ['0']*listlen to:
['1' if x in dict[col] else '0' for x in thekeys]

Upvotes: 1

Related Questions