Reputation: 423
I have the following data in my dataframe B:
F1 F2 Count
A C 5
B C 2
B U 6
C A 1
I want to make a square matrix out of them so the results will be:
A B C U
A 0 0 6 0
B 0 0 2 6
C 6 2 0 0
U 0 6 0 0
I initially used pd.crosstab()
but some variables in F1/F2 is missing in the matrix.
AC = 5 CA = 1 therefore the output should be 6.
Also pdcrosstab() does not recognize BU = UB
, etc.
Anyone who could help? I am basically new to python.
Btw, this is my code:
wow=pd.crosstab(B.F1,
B.F2,
values=B.Count,
aggfunc='sum',
).rename_axis(None).rename_axis(None, axis=1)
Upvotes: 2
Views: 134
Reputation: 15872
You can pd.concat
, wow
and wow.T
then groupby
index and sum
again:
>>> wow=pd.crosstab(B.F1,
B.F2,
values=B.Count,
aggfunc='sum',
).rename_axis(None).rename_axis(None, axis=1)
>>> wow
A C U
A NaN 5.0 NaN
B NaN 2.0 6.0
C 1.0 NaN NaN
>>> pd.concat([wow, wow.T], sort=True).fillna(0, downcast='infer').groupby(level=0).sum()
A B C U
A 0 0 6 0
B 0 0 2 6
C 6 2 0 0
U 0 6 0 0
Upvotes: 2
Reputation: 4273
You can make columns F1
and F2
categorical and use crosstab
for the work.
FDtype = pd.CategoricalDtype(list("ABCU"))
df[["F1", "F2"]] = df[["F1", "F2"]].astype(FDtype)
count = pd.crosstab(df["F1"], df["F2"], df["Count"], aggfunc='sum', dropna=False)
count.fillna(0, inplace=True, downcast="infer")
count += count.T
Remark: it is more efficient to specify the column dtypes while the DataFrame is constructed
Upvotes: 1
Reputation: 17834
You can append the DataFrame where 'F1'
and 'F2'
are swapped to the original DataFrame.
df1 = df.append(df.rename({'F1': 'F2', 'F2': 'F1'}, axis=1), sort=False)
Then you can use pivot_table
:
pd.pivot_table(df1, values='Count', index='F1', columns='F2', aggfunc='sum', fill_value=0)
or crosstab
:
pd.crosstab(df1.F1, df1.F2, df1.Count, aggfunc='sum').fillna(0)
Finally remove columns and index names:
del df1.columns.name, df1.index.name
Result:
A B C U
A 0 0 6 0
B 0 0 2 6
C 6 2 0 0
U 0 6 0 0
Upvotes: 1