Lala James
Lala James

Reputation: 35

How to form count matrix from occurrence of combination using Python?

I run into such a coding problem.

Given a data frame with column name ('col1','col2') just like the picture below shown

Dataframe:

enter image description here

Here, the variables of the column 'col1' and 'col2' are categorical data. I want to form a matrix with each entry counting the number of rows for a specific choice of col1 and of col2. For example, the number of rows such that Column 'col1' equal to '1' and Column 'col2' equal to '1'. (Line 316 in the picture shown)

I got a very large dataset. col1 and col2 both have 1000 different values. If I use a double for loop it would be very slow. Is there a faster way to do so?

I know that .groupby(by=['col1','col2']).size().to_frame('count').reset_index() would give a sparse representation of the matrix.

Example:

enter image description here

However, it is not the form I want.

Upvotes: 1

Views: 600

Answers (1)

Scott Boston
Scott Boston

Reputation: 153460

IIUC, I think you need pd.crosstab:

pd.crosstab(tmp['col1'], tmp['col2'])

Output:

col2  1  2
col1      
1     1  3
2     1  1

Note: 'col1' and 'col2' are index labels.

Upvotes: 1

Related Questions