Reputation: 79
A dumb question. How to manipulate columns in Polars?
Explicitly, I have a table with 3 columns : N , Survivors, Deaths
I want to replace Deaths by Deaths * N and Survivors by Survivors * N
the following code is not working
table["SURVIVORS"] = table["SURVIVORS"]*table["N"]
I have this error:
TypeError: 'DataFrame' object does not support 'Series' assignment by index. Use 'DataFrame.with_columns'
thank you
Upvotes: 2
Views: 3381
Reputation: 18446
Polars isn't pandas.
You can't assign a part of a df. To put that another way, the left side of the equals has to be a full df so forget about this syntax table["SURVIVORS"]=
You'll mainly use the with_columns
, select
methods. The first will add columns to your existing df based on the expression you feed them whereas select
will only return what you ask for.
In your case, since you want to overwrite SURVIVORS and DEATHS while keeping N you'd do:
table=table.with_columns([
pl.col('SURVIVORS')*pl.col('N'),
pl.col('DEATHS')*pl.col('N')
])
If you wanted to rename the columns then you might think to do this:
table=table.with_columns([
(pl.col('SURVIVORS')*pl.col('N')).alias('SURIVORS_N'),
(pl.col('DEATHS')*pl.col('N')).alias('DEATHS_N')
])
in this case, since with_columns
just adds columns, you'll still have the original SURVIVORS and DEATHS column.
This brings it back to select
, if you want to have explicit control of what is returned, including the order, then do select
:
table=table.select([ 'N',
(pl.col('SURVIVORS')*pl.col('N')).alias('SURIVORS_N'),
(pl.col('DEATHS')*pl.col('N')).alias('DEATHS_N')
])
One note, you can refer to a column by just giving its name, like 'N' in the previous example as long as you don't want to do anything to it. If you want to do something with it (math, rename, anything) then you have to wrap it in pl.col('column_name')
so that it becomes an Expression.
Upvotes: 9