Reputation: 35
I run into such a coding problem.
Given a data frame with column name ('col1','col2') just like the picture below shown
Dataframe:
Here, the variables of the column 'col1' and 'col2' are categorical data. I want to form a matrix with each entry counting the number of rows for a specific choice of col1 and of col2. For example, the number of rows such that Column 'col1' equal to '1' and Column 'col2' equal to '1'. (Line 316 in the picture shown)
I got a very large dataset. col1 and col2 both have 1000 different values. If I use a double for loop it would be very slow. Is there a faster way to do so?
I know that .groupby(by=['col1','col2']).size().to_frame('count').reset_index()
would give a sparse representation of the matrix.
Example:
However, it is not the form I want.
Upvotes: 1
Views: 600
Reputation: 153460
IIUC, I think you need pd.crosstab
:
pd.crosstab(tmp['col1'], tmp['col2'])
Output:
col2 1 2
col1
1 1 3
2 1 1
Note: 'col1' and 'col2' are index labels.
Upvotes: 1