Interesting results with duplicate columns in pandas.DataFrame

Question

Can anyone help to explain why I get errors in some actions and not others when there is a duplicate column in a pandas.DataFrame.

Minimal, Reproducible Example

import pandas as pd
df = pd.DataFrame(columns=['a', 'b', 'b'])

If I try and insert a list into column 'a' I get an error about dimension mis-match:

df.loc[:, 'a'] = list(range(5))

Traceback (most recent call last):
...
ValueError: cannot copy sequence with size 5 to array axis with dimension 0

Similar with 'b':

df.loc[:, 'b'] = list(range(5))

Traceback (most recent call last):
...
ValueError: could not broadcast input array from shape (5) into shape (0,2)

However if I insert into an entirely new column, I don't get an error, unless I insert into 'a' or 'b':

df.loc[:, 'c'] = list(range(5))
print(df)

     a    b    b  c
0  NaN  NaN  NaN  0
1  NaN  NaN  NaN  1
2  NaN  NaN  NaN  2
3  NaN  NaN  NaN  3
4  NaN  NaN  NaN  4

df.loc[:, 'a'] = list(range(5))

Traceback (most recent call last):
...
ValueError: Buffer has wrong number of dimensions (expected 1, got 0)

All of these errors disappear if I remove the duplicate column 'b'

Additional information

pandas==1.0.2

Interesting results with duplicate columns in pandas.DataFrame

Answers (1)

Related Questions