How to form count matrix from occurrence of combination using Python?

Question

I run into such a coding problem.

Given a data frame with column name ('col1','col2') just like the picture below shown

Dataframe:

Here, the variables of the column 'col1' and 'col2' are categorical data. I want to form a matrix with each entry counting the number of rows for a specific choice of col1 and of col2. For example, the number of rows such that Column 'col1' equal to '1' and Column 'col2' equal to '1'. (Line 316 in the picture shown)

I got a very large dataset. col1 and col2 both have 1000 different values. If I use a double for loop it would be very slow. Is there a faster way to do so?

I know that .groupby(by=['col1','col2']).size().to_frame('count').reset_index() would give a sparse representation of the matrix.

Example:

However, it is not the form I want.

Scott Boston · Accepted Answer

IIUC, I think you need pd.crosstab:

pd.crosstab(tmp['col1'], tmp['col2'])

Output:

Note: 'col1' and 'col2' are index labels.

How to form count matrix from occurrence of combination using Python?

Answers (1)

Related Questions