drhagen
drhagen

Reputation: 9532

Make a constant column in Polars

In Polars 0.13.14, I could create a DataFrame with an all-constant column like this:

import polars as pl

pl.DataFrame(dict(x=pl.repeat(1, 3)))

# shape: (3, 1)
# ┌─────┐
# │ x   │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1   │
# │ 1   │
# │ 1   │
# └─────┘

But in Polars 0.13.15, this is an error

ValueError: Series constructor not called properly.

How do I fill a column with a value in polars?

Upvotes: 10

Views: 10912

Answers (4)

drhagen
drhagen

Reputation: 9532

Starting in Polars 0.13.15, repeat became a lazy function by default and lazy functions are not evaluated in the DataFrame constructor. You can get the eager behavior back with the eager=True flag:

import polars as pl

pl.DataFrame(dict(x=pl.repeat(1, 3, eager=True)))

Or you can use pl.select():

import polars as pl

pl.select(x=pl.repeat(1, 3))

Upvotes: 5

Moriarty Snarly
Moriarty Snarly

Reputation: 676

You might be looking for pl.lit(..)

import polars as pl
df = pl.DataFrame({"a": [1,2,3], "b": [4, 5, 6]})
print(df)
print(df.with_columns(pl.lit(1).alias("constant_column")))

This will give you the following output

shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 4   │
│ 2   ┆ 5   │
│ 3   ┆ 6   │
└─────┴─────┘
shape: (3, 3)
┌─────┬─────┬─────────────────┐
│ a   ┆ b   ┆ constant_column │
│ --- ┆ --- ┆ ---             │
│ i64 ┆ i64 ┆ i32             │
╞═════╪═════╪═════════════════╡
│ 1   ┆ 4   ┆ 1               │
│ 2   ┆ 5   ┆ 1               │
│ 3   ┆ 6   ┆ 1               │
└─────┴─────┴─────────────────┘

Upvotes: 16

okayama-taro
okayama-taro

Reputation: 65

Python lists can add elements dynamically. If you want to create a list where all elements have the same value, you can use the * operator.

# [1, 1, 1] = [1] * 3
# If it's a serial number from 1 to 3 ;
# [1, 2, 3] = list(range(1,4))

import polars as pl

df = pl.DataFrame({'X': [1] * 3})

Result:

shape: (3, 1)
┌─────┐
│ X   │
│ --- │
│ i64 │
╞═════╡
│ 1   │
│ 1   │
│ 1   │
└─────┘

Upvotes: 2

altunyurt
altunyurt

Reputation: 2936

The docstring for pl.repeat (polars v0.20.0) is as follows:

pl.repeat(
    value: 'IntoExpr | None',
    n: 'int | Expr',
    *,
    dtype: 'PolarsDataType | None' = None,
    eager: 'bool' = False,
) -> 'Expr | Series'

By default, it returns a lazy expression. To have it eagerly evaluate and return a series, you'll need to use it as

pl.repeat(1,2, eager=True)

as @drhagen already mentioned.

Expressions can also be run using pl.select, and converted to series as:

In[265]: pl.select(pl.repeat(1, 2)).to_series()
Out[265]:
shape: (2,)
Series: 'repeat' [i32]
[
        1
        1
]

pl.select runs the expressions in parallel, so you can simplify and speed up the process as

pl.DataFrame(pl.select(a=pl.repeat(1, 100), b=pl.repeat(1, 100)).to_dict())

which might be handy when you have a lot of expression evaluation goin on. To demonstrate it:

In [277]: %timeit pl.select(a=pl.repeat(1, 100, eager=True))
150 µs ± 6.2 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [278]: %timeit pl.select(a=pl.repeat(1, 100, eager=True), b=pl.repeat(1, 100, eager=True))
331 µs ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

In [279]: %timeit pl.select(a=pl.repeat(1, 100), b=pl.repeat(1, 100))
128 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [280]: %timeit pl.select(a=pl.repeat(1, 100))
104 µs ± 8.09 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [281]: %timeit pl.repeat(1, 100, eager=True)
99 µs ± 5.18 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)

In [282]: %timeit pl.repeat(1, 100, eager=True); pl.repeat(1, 100, eager=True)
208 µs ± 14.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

Upvotes: 1

Related Questions