Soufiane Fadili
Soufiane Fadili

Reputation: 79

Manipulating data in Polars

A dumb question. How to manipulate columns in Polars?

Explicitly, I have a table with 3 columns : N , Survivors, Deaths

I want to replace Deaths by Deaths * N and Survivors by Survivors * N

the following code is not working

table["SURVIVORS"] = table["SURVIVORS"]*table["N"]

I have this error:

TypeError: 'DataFrame' object does not support 'Series' assignment by index. Use 'DataFrame.with_columns'

thank you

Upvotes: 2

Views: 3381

Answers (1)

Dean MacGregor
Dean MacGregor

Reputation: 18446

Polars isn't pandas.

You can't assign a part of a df. To put that another way, the left side of the equals has to be a full df so forget about this syntax table["SURVIVORS"]=

You'll mainly use the with_columns, select methods. The first will add columns to your existing df based on the expression you feed them whereas select will only return what you ask for.

In your case, since you want to overwrite SURVIVORS and DEATHS while keeping N you'd do:

table=table.with_columns([
                          pl.col('SURVIVORS')*pl.col('N'),
                          pl.col('DEATHS')*pl.col('N')
                         ])

If you wanted to rename the columns then you might think to do this:

table=table.with_columns([
                          (pl.col('SURVIVORS')*pl.col('N')).alias('SURIVORS_N'),
                          (pl.col('DEATHS')*pl.col('N')).alias('DEATHS_N')
                         ])

in this case, since with_columns just adds columns, you'll still have the original SURVIVORS and DEATHS column.

This brings it back to select, if you want to have explicit control of what is returned, including the order, then do select:

table=table.select([      'N',
                          (pl.col('SURVIVORS')*pl.col('N')).alias('SURIVORS_N'),
                          (pl.col('DEATHS')*pl.col('N')).alias('DEATHS_N')
                         ])

One note, you can refer to a column by just giving its name, like 'N' in the previous example as long as you don't want to do anything to it. If you want to do something with it (math, rename, anything) then you have to wrap it in pl.col('column_name') so that it becomes an Expression.

Upvotes: 9

Related Questions