Reputation: 10709
What is the best solution to pivot/cross-tab tables in Python 3? Is there a built-in function that will do this? Ideally, I'm looking for a Python 3 solution that does not have external dependencies. For example, given a nested list:
nl = [["apples", 2 "New York"],
["peaches", 6, "New York"],
["apples", 6, "New York"],
["peaches", 1, "Vermont"]]
I would like to be able to rearrange rowed data and groupby fields:
apples peaches
New York 2 6
Vermont 6 1
The above is a trivial example, but is there a solution that would be easier than using itertools.groupby
everytime a pivot is desired? Ideally, the solution would allow rowed data to be pivoted on any column. I was debating about using pandas, but it is an external library and only has limited Python 3 support.
Upvotes: 3
Views: 3198
Reputation: 83002
Here is some simple code. Providing row/column/grand totals is left as an exercise for the reader.
class CrossTab(object):
def __init__(
self,
missing=0, # what to return for an empty cell.
# Alternatives: '', 0.0, None, 'NULL'
):
self.missing = missing
self.col_key_set = set()
self.cell_dict = {}
self.headings_OK = False
def add_item(self, row_key, col_key, value):
self.col_key_set.add(col_key)
try:
self.cell_dict[row_key][col_key] += value
except KeyError:
try:
self.cell_dict[row_key][col_key] = value
except KeyError:
self.cell_dict[row_key] = {col_key: value}
def _process_headings(self):
if self.headings_OK:
return
self.row_headings = list(sorted(self.cell_dict.keys()))
self.col_headings = list(sorted(self.col_key_set))
self.headings_OK = True
def get_col_headings(self):
self._process_headings()
return self.col_headings
def generate_row_info(self):
self._process_headings()
for row_key in self.row_headings:
row_dict = self.cell_dict[row_key]
row_vals = [
row_dict.get(col_key, self.missing)
for col_key in self.col_headings
]
yield row_key, row_vals
if __name__ == "__main__":
data = [["apples", 2, "New York"],
["peaches", 6, "New York"],
["apples", 6, "New York"],
["peaches", 1, "Vermont"]]
ctab = CrossTab(missing='uh-oh')
for s in data:
ctab.add_item(row_key=s[2], col_key=s[0], value=s[1])
print()
print('Column headings:', ctab.get_col_headings())
for row_heading, row_values in ctab.generate_row_info():
print(repr(row_heading), row_values)
Output:
Column headings: ['apples', 'peaches']
'New York' [8, 6]
'Vermont' ['uh-oh', 1]
See also this answer.
And this one, which I'd forgotten about.
Upvotes: 1
Reputation: 38512
itertools.groupby
was exactly made for this problem. You will be hard-pressed to find something better, especially within the standard library.
Upvotes: 0