detly
detly

Reputation: 30332

Why does plotting combined Pandas data in Seaborn give "TypeError: -0.8 is not a string"?

I am working through some statistics examples using Scitkit-learn (0.20.0), and trying to plot some things as I go with Seaborn (0.9.0). I keep encountering errors when I try to plot data sets I've combined using Pandas' concat() function.

Here is the most minimal example I could construct:

import numpy
import pandas
import seaborn

X = numpy.array([[-1, -1, "A"]])
P = numpy.array([[-0.8, -1]])

data_x = pandas.DataFrame(X, columns=('x','y','group'))
data_p = pandas.DataFrame(P, columns=('x','y'))

data_p['group'] = "B"

combined = pandas.concat([data_x, data_p], ignore_index=True, sort=True)

seaborn.scatterplot(data=combined, x='x', y='y')

This results in a traceback ending in:

TypeError: -0.8 is not a string

If I remove the 'A' and 'group' columns, there's no error. If I plot data_x or data_p separately, there's no error. But I'm using Seaborn to plot the results of supervised classification exercises, so having eg. columns for the 2D data plus category columns for grouping (eg. group is A or B differentiated by hue) and whether something was known or predicted (eg. kind is known or predicted differentiated by style) is very useful.

Hence I don't want to drop category columns just to avoid the errors here.

What am I doing wrong?

Upvotes: 4

Views: 978

Answers (2)

Abhi
Abhi

Reputation: 4233

When you construct a numpy array with a string, all other values in the array will also be treated as objects.

X = numpy.array([[-1, -1, "A"]])

print (X)

array([['-1', '-1', 'A']], dtype='<U11') 

P = numpy.array([[-0.8, -1]])

array([[-0.8, -1. ]])          ## Remains as float.

So, constructing a dataframe with array X will results in a dataframe where all columns are objects where as data_p will remain float.

data_x = pandas.DataFrame(X, columns=('x','y','group'))

print (data_x.dtypes)
x        object  
y        object               ## object dtypes
group    object
dtype: object

data_p = pandas.DataFrame(P, columns=('x','y'))
data_p['group'] = "B"

print (data_p.dtypes)

x        float64
y        float64            ## Here x and y remains as float.
group     object            
dtype: object

Now, when you concat both dataframes, Here x and y columns being object in one and float in another will default to object dtype in combined.

combined = pandas.concat([data_x, data_p], ignore_index=True, sort=True)

print (combined.dtypes)

group    object
x        object
y        object
dtype: object

So the reason for TypeError is due to the resulting columns x & y being object dtype. Scatter plot requires numeric columns for plotting.

combined = combined.apply(pd.to_numeric, errors='ignore')  ## Convert to numeric

group     object
x        float64
y        float64
dtype: object

seaborn.scatterplot(data=combined, x='x', y='y')

Plot

Upvotes: 3

Martyna
Martyna

Reputation: 212

When you create your data like that, all elements in X array are treated as objects. You can see it when you print data_x.info().

To avoid it you can make sure that x and y in your primary DataFrames are of numerical type while generating data (I assume here you just have an example). This solution is recommended.

If from any reason impossible, you can do it afterwards, e.g.

combined['x'] = combined['x'].astype('int') combined['y'] = combined['y'].astype('int')

Upvotes: 1

Related Questions