João Pedro Veiga
João Pedro Veiga

Reputation: 11

Using of pandas method crosstab

I want to crosstab two series(s1 and s2) and it´s appearing the following error:" cannot reindex from a duplicate axis"

s1=pd.Series(['ot','bx','bx','bx','ot','ot','med','med','bx','med'],index=['a','b','c','a','b','c','a','b','c','a'])

s2=pd.Series(['adulto','adulto','idoso','adulto','jovem','jovem','adulto','jovem','jovem','adulto'],index=['a','b','c','a','b','c','a','b','c','a'])

print(pd.crosstab(s1,s2))

I´ve tried to change the index, but it didn´t work.

Upvotes: 1

Views: 216

Answers (1)

Henry Ecker
Henry Ecker

Reputation: 35646

This is an issue with DataFrame construction that happens behind the scenes in the crosstab function. crosstab tries to make a DataFrame to pivot_table from the provided Series. Which causes an indexing issue. This can be replicated via:

df = pd.DataFrame({'a': s1, 'b': s2}, index=s1.index.intersection(s2.index))

The Source code in crosstab where this actually occurs for reference.


Assuming out series can be aligned 1-to-1 (row order), we can simply remove the index with .values or .to_numpy:

pd.crosstab(s1.values, s2.values)

col_0  adulto  idoso  jovem
row_0                      
bx          2      1      1
med         2      0      1
ot          1      0      2

Or by dropping the non-unique indexes with reset_index:

pd.crosstab(s1.reset_index(drop=True), s2.reset_index(drop=True))

col_0  adulto  idoso  jovem
row_0                      
bx          2      1      1
med         2      0      1
ot          1      0      2

If the Series are not conveniently aligned correctly positionally, we can enumerate each index value and merge using groupby cumcount to create a uniformly indexed DataFrame, and then we can take the crosstab:

df1 = s1.reset_index(name='s1')
df2 = s2.reset_index(name='s2')

df3 = df1.merge(df2,
                left_on=['index', df1.groupby('index').cumcount()],
                right_on=['index', df2.groupby('index').cumcount()])

df3:

  index  key_1   s1      s2
0     a      0   ot  adulto
1     b      0   bx  adulto
2     c      0   bx   idoso
3     a      1   bx  adulto
4     b      1   ot   jovem
5     c      1   ot   jovem
6     a      2  med  adulto
7     b      2  med   jovem
8     c      2   bx   jovem
9     a      3  med  adulto

Now indexes are aligned relative to their position in within index groups, not in their absolute position in the Series:

pd.crosstab(df3['s1'], df3['s2'])

s2   adulto  idoso  jovem
s1                       
bx        2      1      1
med       2      0      1
ot        1      0      2

*The result is the same here.

Upvotes: 2

Related Questions