McGoushie
McGoushie

Reputation: 19

Polars read_excel not equal to Pandas read_excel for columns with "mixed" types

I'm trying to read some excel data via Polars.read_excel(), and the data is not identical to the Pandas.read_excel() approach for columns with mixed data.

Here's an example to illustrate:

# create sample data, save to excel. 
test = pd.DataFrame(
    {
    'nums':  [1, 2, 3],
    'mixed': [1, 4, '6A'],
    'factor': ['A', 'B', 'C']
    }
)
test.to_excel('test.xlsx', index = False)

# read data using Pandas and Polars. Convert polars version to pandas.
test_pd = pd.read_excel('test.xlsx', engine='openpyxl')

test_pl = pl.read_excel('test.xlsx')
test_pl = test_pl.to_pandas()

# compare the two
print(test_pd)
print(test_pl)
print(test_pd == test_pl)

print(test_pd) and print(test_pl), suggest the data is identical. However, print(test_pd == test_pl) returns the following:

   nums  mixed  factor
0  True  False    True
1  True  False    True
2  True   True    True

Is there something causing the data to not be identical? And is this a Polars (or Arrow) limitation when dealing with object variables? I want the pl.read_excel() / conversion to pandas approach to ultimately yield an identical DataFrame to pd.read_excel().

Thanks!

Upvotes: 0

Views: 1183

Answers (2)

Dean MacGregor
Dean MacGregor

Reputation: 18556

Polars and arrow rely on strict data types so ultimately, yes, it's a limitation. You can never have a column that is sometimes Utf8 and sometimes Floatxx.

Pandas, on the other hand, is happy to have a column of mixed data types because it's basically just a python list.

Upvotes: 1

Zbiggi
Zbiggi

Reputation: 26

somehow polars made some of your numbers to strings. Look here:

test_pl.iloc[0,1]
'1'

while pandas made integers, where it is possible. The same cell in pandas:

test_pd.iloc[0,1]
1

If you enforce typecast to both tables all cells are equal:

test_pd.astype('string') == test_pl.astype('string')

  nums  mixed  factor
0  True   True    True
1  True   True    True
2  True   True    True

Upvotes: 1

Related Questions