Big Secret
Big Secret

Reputation: 47

Counting String Values in Pivot Across Multiple Columns

I'd like to use Pandas to pivot a table into multiple columns, and get the count of their values.

In this example table:

LOCATION ADDRESS PARKING TYPE
AAA0001 123 MAIN LARGE LOT
AAA0001 123 MAIN SMALL LOT
AAA0002 456 TOWN LARGE LOT
AAA0003 789 AVE MEDIUM LOT
AAA0003 789 AVE MEDIUM LOT

How do I pivot out this table to show total counts of each string within "Parking Type"? Maybe my mistake is calling this a "pivot?"

Desired output:

LOCATION ADDRESS SMALL LOT MEDIUM LOT LARGE LOT
AAA0001 123 MAIN 1 0 1
AAA0002 456 TOWN 0 0 1
AAA0003 789 AVE 0 2 0

Currently, I have a pivot going, but it is only counting the values of the first column, and leaving everything else as 0s. Any guidance would be amazing.

Current Code:

pivot = pd.pivot_table(df, index=["LOCATION"], columns=['PARKING TYPE'], aggfunc=len)
pivot = pivot.reset_index()
pivot.columns = pivot.columns.to_series().apply(lambda x: "".join(x))

Upvotes: 0

Views: 233

Answers (2)

sophocles
sophocles

Reputation: 13831

You can use get_dummies() and then a grouped sum to get a row per your groups:

>>> pd.get_dummies(df, columns=['PARKING TYPE']).groupby(['LOCATION','ADDRESS'],as_index=False).sum()

  LOCATION   ADDRESS  PARKING TYPE_LARGE LOT  PARKING TYPE_MEDIUM LOT  PARKING TYPE_SMALL LOT
0  AAA0001  123 MAIN                       1                        0                       1
1  AAA0002  456 TOWN                       1                        0                       0
2  AAA0003   789 AVE                       0                        2                       0

Upvotes: 2

user7864386
user7864386

Reputation:

You could use pd.crosstab:

out = (pd.crosstab(index=[df['LOCATION'], df['ADDRESS']], columns=df['PARKING TYPE'])
       .reset_index()
       .rename_axis(columns=[None]))

or you could use pivot_table (but you have to pass "ADDRESS" into the index as well):

out = (pd.pivot_table(df, index=['LOCATION','ADDRESS'], columns=['PARKING TYPE'], values='ADDRESS', aggfunc=len, fill_value=0)
       .reset_index()
       .rename_axis(columns=[None]))

Output:

  LOCATION   ADDRESS  LARGE LOT  MEDIUM LOT  SMALL LOT
0  AAA0001  123 MAIN          1           0          1
1  AAA0002  456 TOWN          1           0          0
2  AAA0003   789 AVE          0           2          0

Upvotes: 3

Related Questions