kiwi_kimchi
kiwi_kimchi

Reputation: 423

Making a matrix-format from python

I have the following data in my dataframe B:

F1     F2    Count
A      C      5
B      C      2
B      U      6
C      A      1

I want to make a square matrix out of them so the results will be:

    A    B   C  U
A   0    0   6  0
B   0    0   2  6
C   6    2   0  0
U   0    6   0  0

I initially used pd.crosstab() but some variables in F1/F2 is missing in the matrix.

AC = 5 CA = 1 therefore the output should be 6.

Also pdcrosstab() does not recognize BU = UB, etc.

Anyone who could help? I am basically new to python.

Btw, this is my code:

wow=pd.crosstab(B.F1, 
            B.F2, 
            values=B.Count, 
            aggfunc='sum',
            ).rename_axis(None).rename_axis(None, axis=1)

Upvotes: 2

Views: 134

Answers (3)

Sayandip Dutta
Sayandip Dutta

Reputation: 15872

You can pd.concat, wow and wow.T then groupby index and sum again:

>>> wow=pd.crosstab(B.F1, 
            B.F2, 
            values=B.Count, 
            aggfunc='sum',
            ).rename_axis(None).rename_axis(None, axis=1)
>>> wow
     A    C    U
A  NaN  5.0  NaN
B  NaN  2.0  6.0
C  1.0  NaN  NaN

>>> pd.concat([wow, wow.T], sort=True).fillna(0, downcast='infer').groupby(level=0).sum()
   A  B  C  U
A  0  0  6  0
B  0  0  2  6
C  6  2  0  0
U  0  6  0  0

Upvotes: 2

GZ0
GZ0

Reputation: 4273

You can make columns F1 and F2 categorical and use crosstab for the work.

FDtype = pd.CategoricalDtype(list("ABCU"))

df[["F1", "F2"]] = df[["F1", "F2"]].astype(FDtype)
count = pd.crosstab(df["F1"], df["F2"], df["Count"], aggfunc='sum', dropna=False)
count.fillna(0, inplace=True, downcast="infer")
count += count.T

Remark: it is more efficient to specify the column dtypes while the DataFrame is constructed

Upvotes: 1

Mykola Zotko
Mykola Zotko

Reputation: 17834

You can append the DataFrame where 'F1' and 'F2' are swapped to the original DataFrame.

df1 = df.append(df.rename({'F1': 'F2', 'F2': 'F1'}, axis=1), sort=False)

Then you can use pivot_table:

pd.pivot_table(df1, values='Count', index='F1', columns='F2', aggfunc='sum', fill_value=0)

or crosstab:

pd.crosstab(df1.F1, df1.F2, df1.Count, aggfunc='sum').fillna(0)

Finally remove columns and index names:

del df1.columns.name, df1.index.name

Result:

   A  B  C  U
A  0  0  6  0
B  0  0  2  6
C  6  2  0  0
U  0  6  0  0

Upvotes: 1

Related Questions