Reputation: 600
I would like to replace Pandas with Polars but I was not able to find out how to use Polars with Plotly without converting to Pandas. I wonder if there is a way to completely cut Pandas out of the process.
Consider the following test data:
import polars as pl
import numpy as np
import plotly.express as px
df = pl.DataFrame(
{
"nrs": [1, 2, 3, None, 5],
"names": ["foo", "ham", "spam", "egg", None],
"random": np.random.rand(5),
"groups": ["A", "A", "B", "C", "B"],
}
)
fig = px.bar(df, x='names', y='random')
fig.show()
I would like this code to show the bar chart in a Jupyter notebook but instead it returns an error:
/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/polars/internals/frame.py:1483: UserWarning: accessing series as Attribute of a DataFrame is deprecated
warnings.warn("accessing series as Attribute of a DataFrame is deprecated")
It is possible to transform the Polars data frame to a Pandas data frame with df = df.to_pandas()
. Then, it works. However, is there another, simpler and more elegant solution?
Upvotes: 12
Views: 11991
Reputation: 881
FYI: plotly-express
has just merged generic DataFrame support (via narwhals), meaning that Polars will be natively supported, so no more transforms to Pandas under the hood (and, as you might suspect, this comes with a nice plotting performance boost when using a Polars frame).
Upvotes: 3
Reputation: 21
Currently making the switch to pola.rs from pandas. From my research your [] will work but is considered an anti-pattern in polars. This author suggests that you use the .to_series method.
px.pie(df, # Polars DataFrame
names = df.select('Model').to_series(),
values = df.select('Sales').to_series(),
hover_name = df.select('Model').to_series(),
color_discrete_sequence= px.colors.sequential.Plasma_r)
https://towardsdatascience.com/visualizing-polars-dataframes-using-plotly-express-8da4357d2ee0
When it comes to visualization of polar dataframe it seems you can't totally be rid of pandas dataframe conversion.
Hope this helped
Upvotes: 2
Reputation: 9810
Yes, no need for converting to a Pandas dataframe. Someone (sa-) has requested supporting a better option here and included a workaround for it.
"The workaround that I use right now is px.line(x=df["a"], y=df["b"]), but it gets unwieldy if the name of the data frame is too big"
For the OP's code example, the approach of specifying the dataframe columns explicitly works.
I find in addition to specifying the dataframe columns with px.bar(x=df["names"], y=df["random"])
- or - px.bar(df, x=df["names"], y=df["random"])
, casting to a list can also work:
import polars as pl
import numpy as np
import plotly.express as px
df = pl.DataFrame(
{
"nrs": [1, 2, 3, None, 5],
"names": ["foo", "ham", "spam", "egg", None],
"random": np.random.rand(5),
"groups": ["A", "A", "B", "C", "B"],
}
)
px.bar(df, x=list(df["names"]), y=list(df["random"]))
Knowing polars better, you may see some other options once you see the idea of the workaround.
The example posted there is simpler, instead of px.line(df, x="a", y="b")
like you could use for a Pandas dataframe, you use px.line(x=df["a"], y=df["b"])
. With polars, that is:
import polars as pl
import plotly.express as px
df = pl.DataFrame({"a":[1,2,3,4,5], "b":[1,4,9,16,25]})
px.line(x=df["a"], y=df["b"])
(Note that using plotly.express
requires Pandas to be installed, see here and here. I used plotly.express
in my answer because it was closer to the OP. The code could be adapted to using plotly.graph_objects
if there was a desire to not have Pandas installed & involved at all.)
Upvotes: 12