Reputation: 11
I want to crosstab two series(s1 and s2) and it´s appearing the following error:" cannot reindex from a duplicate axis"
s1=pd.Series(['ot','bx','bx','bx','ot','ot','med','med','bx','med'],index=['a','b','c','a','b','c','a','b','c','a'])
s2=pd.Series(['adulto','adulto','idoso','adulto','jovem','jovem','adulto','jovem','jovem','adulto'],index=['a','b','c','a','b','c','a','b','c','a'])
print(pd.crosstab(s1,s2))
I´ve tried to change the index, but it didn´t work.
Upvotes: 1
Views: 216
Reputation: 35646
This is an issue with DataFrame construction that happens behind the scenes in the crosstab
function. crosstab
tries to make a DataFrame
to pivot_table
from the provided Series. Which causes an indexing issue. This can be replicated via:
df = pd.DataFrame({'a': s1, 'b': s2}, index=s1.index.intersection(s2.index))
The Source code in crosstab
where this actually occurs for reference.
Assuming out series can be aligned 1-to-1 (row order), we can simply remove the index with .values
or .to_numpy
:
pd.crosstab(s1.values, s2.values)
col_0 adulto idoso jovem
row_0
bx 2 1 1
med 2 0 1
ot 1 0 2
Or by dropping the non-unique indexes with reset_index
:
pd.crosstab(s1.reset_index(drop=True), s2.reset_index(drop=True))
col_0 adulto idoso jovem
row_0
bx 2 1 1
med 2 0 1
ot 1 0 2
If the Series are not conveniently aligned correctly positionally, we can enumerate each index value and merge
using groupby cumcount
to create a uniformly indexed DataFrame, and then we can take the crosstab
:
df1 = s1.reset_index(name='s1')
df2 = s2.reset_index(name='s2')
df3 = df1.merge(df2,
left_on=['index', df1.groupby('index').cumcount()],
right_on=['index', df2.groupby('index').cumcount()])
df3
:
index key_1 s1 s2
0 a 0 ot adulto
1 b 0 bx adulto
2 c 0 bx idoso
3 a 1 bx adulto
4 b 1 ot jovem
5 c 1 ot jovem
6 a 2 med adulto
7 b 2 med jovem
8 c 2 bx jovem
9 a 3 med adulto
Now indexes are aligned relative to their position in within index groups, not in their absolute position in the Series:
pd.crosstab(df3['s1'], df3['s2'])
s2 adulto idoso jovem
s1
bx 2 1 1
med 2 0 1
ot 1 0 2
*The result is the same here.
Upvotes: 2