Reputation: 31
I have a similar question to: Crosstab with multiple items but am not trying to do it in R, I am trying to do it in Python Pandas with Crosstab.
I have been trying to use Python Pandas crosstab function to make a demographic table, but have only been able to do one demographic at a time. In other words, I would like to make a crosstabs that has all row variables on the same level. Perhaps this is not the function of crosstabs and something like Pandas pivot table would be better for this?
Currently I use the following three lines of code, but would think there is someway to combine these:
genderTable = pd.crosstab(refQtrData['GENDER'], [refQtrData['FUNDINGSOURCE'],refQtrData['PROVIDER'],refQtrData['LOCATION']], margins='true')
raceTable = pd.crosstab(refQtrData['RACETH4'], [refQtrData['FUNDINGSOURCE'],refQtrData['PROVIDER'],refQtrData['LOCATION']], margins='true')
ageTable = pd.crosstab(refQtrData['REFERRED'], [refQtrData['FUNDINGSOURCE'],refQtrData['PROVIDER'],refQtrData['LOCATION']], values=refQtrData['AGEREF'], aggfunc='mean')
What I would like to do: Demographic Table
This is originally done in SPSS with the code below, but I am trying to move it over to python. In the same way that SPSS CTABLES allows me to have multiple categories and variables, I'd like to have multiple rows that correspond to different variables without them being on different levels.
CTABLES
/VLABELS VARIABLES= GENDER RACE AGE FUNDINGSOURCE PROVIDER LOCATION
DISPLAY=LABEL
/TABLE REFERRED [C][COUNT F40.0] + GENDER [C][COUNT F40.0, COLPCT.COUNT PCTPAREN40.0] + RACE
[C][COUNT F40.0, COLPCT.COUNT PCTPAREN40.0] + AGE [S][MEAN] + AGE [S][MINIMUM, MAXIMUM]
BY FUNDINGSOURCE [C] > PROVIDER [C] > LOCATION [C]
/SLABELS VISIBLE=NO
/CATEGORIES VARIABLES=GENDER RACE ORDER=A KEY=VALUE MISSING=INCLUDE EMPTY=INCLUDE
/CATEGORIES VARIABLES=FUNDINGSOURCE ORDER=A KEY=VALUE MISSING=INCLUDE EMPTY=EXCLUDE
/CATEGORIES VARIABLES=PROVIDER [1, 2] EMPTY=EXCLUDE
/CATEGORIES VARIABLES=LOCATION [1, 2] EMPTY=EXCLUDE.
Upvotes: 3
Views: 5839
Reputation: 1670
In absence of a reproducible example, we can lean on the pandas crosstab doc which has some example crosstabs copy/pasted below.
import pandas as pd
import numpy as np
a = np.array(["foo", "foo", "foo", "foo", "bar", "bar","bar", "bar", "foo", "foo", "foo"], dtype=object)
b = np.array(["one", "one", "one", "two", "one", "one", "one", "two", "two", "two", "one"], dtype=object)
c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny", "shiny", "dull", "shiny", "shiny", "shiny"],dtype=object)
d = np.array(["1foo", "1foo", "1foo", "1foo", "1bar", "1bar","1bar", "1bar", "1foo", "1foo", "1foo"], dtype=object)
This gives four arrays. Make crosstabs. This returns DataFrames.
df1 = pd.crosstab(a, [b, c], rownames=['aa'], colnames=['b', 'c'])
df2 = pd.crosstab(d, [b, c], rownames=['aa'], colnames=['b', 'c'])
Go after the DataFrames with pandas.concat([],axis=...)
pd.concat([df1, df2], axis=0)
b one two
c dull shiny dull shiny
aa
bar 1 2 1 0
foo 2 2 1 2
1bar 1 2 1 0
1foo 2 2 1 2
>>> pd.concat([df1, df2], axis=1)
b one two one two
c dull shiny dull shiny dull shiny dull shiny
1bar NaN NaN NaN NaN 1.0 2.0 1.0 0.0
1foo NaN NaN NaN NaN 2.0 2.0 1.0 2.0
bar 1.0 2.0 1.0 0.0 NaN NaN NaN NaN
foo 2.0 2.0 1.0 2.0 NaN NaN NaN NaN
As far as creating three crosstabs with one function call, implement a function that accepts the data and returns concatenated crosstabs. Unsure if it can be done in a reasonable one-liner.
One is then left to further modify or otherwise join DataFrames.
Upvotes: 2