AL00tz
AL00tz

Reputation: 31

Python Crosstabs Multiple Variables or Rows; Demographic Table

Question

I have a similar question to: Crosstab with multiple items but am not trying to do it in R, I am trying to do it in Python Pandas with Crosstab.

I have been trying to use Python Pandas crosstab function to make a demographic table, but have only been able to do one demographic at a time. In other words, I would like to make a crosstabs that has all row variables on the same level. Perhaps this is not the function of crosstabs and something like Pandas pivot table would be better for this?

Currently I use the following three lines of code, but would think there is someway to combine these:

genderTable = pd.crosstab(refQtrData['GENDER'], [refQtrData['FUNDINGSOURCE'],refQtrData['PROVIDER'],refQtrData['LOCATION']], margins='true')
raceTable = pd.crosstab(refQtrData['RACETH4'], [refQtrData['FUNDINGSOURCE'],refQtrData['PROVIDER'],refQtrData['LOCATION']], margins='true')
ageTable = pd.crosstab(refQtrData['REFERRED'], [refQtrData['FUNDINGSOURCE'],refQtrData['PROVIDER'],refQtrData['LOCATION']], values=refQtrData['AGEREF'], aggfunc='mean')

What I would like to do: Demographic Table

Other miscellaneous info

This is originally done in SPSS with the code below, but I am trying to move it over to python. In the same way that SPSS CTABLES allows me to have multiple categories and variables, I'd like to have multiple rows that correspond to different variables without them being on different levels.

CTABLES
  /VLABELS VARIABLES= GENDER RACE AGE FUNDINGSOURCE PROVIDER LOCATION 
    DISPLAY=LABEL
  /TABLE REFERRED [C][COUNT F40.0] + GENDER [C][COUNT F40.0, COLPCT.COUNT PCTPAREN40.0] + RACE 
    [C][COUNT F40.0, COLPCT.COUNT PCTPAREN40.0] + AGE [S][MEAN] + AGE [S][MINIMUM, MAXIMUM]
    BY FUNDINGSOURCE [C] > PROVIDER [C] > LOCATION [C]
  /SLABELS VISIBLE=NO
  /CATEGORIES VARIABLES=GENDER RACE ORDER=A KEY=VALUE MISSING=INCLUDE EMPTY=INCLUDE
  /CATEGORIES VARIABLES=FUNDINGSOURCE ORDER=A KEY=VALUE MISSING=INCLUDE EMPTY=EXCLUDE
  /CATEGORIES VARIABLES=PROVIDER [1, 2] EMPTY=EXCLUDE 
  /CATEGORIES VARIABLES=LOCATION [1, 2] EMPTY=EXCLUDE.

Upvotes: 3

Views: 5839

Answers (1)

Rich Andrews
Rich Andrews

Reputation: 1670

In absence of a reproducible example, we can lean on the pandas crosstab doc which has some example crosstabs copy/pasted below.

import pandas as pd
import numpy as np

a = np.array(["foo", "foo", "foo", "foo", "bar", "bar","bar", "bar", "foo", "foo", "foo"], dtype=object)
b = np.array(["one", "one", "one", "two", "one", "one", "one", "two", "two", "two", "one"], dtype=object)
c = np.array(["dull", "dull", "shiny", "dull", "dull", "shiny", "shiny", "dull", "shiny", "shiny", "shiny"],dtype=object)
d = np.array(["1foo", "1foo", "1foo", "1foo", "1bar", "1bar","1bar", "1bar", "1foo", "1foo", "1foo"], dtype=object)

This gives four arrays. Make crosstabs. This returns DataFrames.

df1 =  pd.crosstab(a, [b, c], rownames=['aa'], colnames=['b', 'c'])
df2 =  pd.crosstab(d, [b, c], rownames=['aa'], colnames=['b', 'c'])

Go after the DataFrames with pandas.concat([],axis=...)

pd.concat([df1, df2], axis=0)
b     one        two      
c    dull shiny dull shiny
aa                        
bar     1     2    1     0
foo     2     2    1     2
1bar    1     2    1     0
1foo    2     2    1     2

>>> pd.concat([df1, df2], axis=1)
b     one        two        one        two      
c    dull shiny dull shiny dull shiny dull shiny
1bar  NaN   NaN  NaN   NaN  1.0   2.0  1.0   0.0
1foo  NaN   NaN  NaN   NaN  2.0   2.0  1.0   2.0
bar   1.0   2.0  1.0   0.0  NaN   NaN  NaN   NaN
foo   2.0   2.0  1.0   2.0  NaN   NaN  NaN   NaN

As far as creating three crosstabs with one function call, implement a function that accepts the data and returns concatenated crosstabs. Unsure if it can be done in a reasonable one-liner.

One is then left to further modify or otherwise join DataFrames.

Upvotes: 2

Related Questions