Reputation: 9532
In Polars 0.13.14, I could create a DataFrame
with an all-constant column like this:
import polars as pl
pl.DataFrame(dict(x=pl.repeat(1, 3)))
# shape: (3, 1)
# ┌─────┐
# │ x │
# │ --- │
# │ i64 │
# ╞═════╡
# │ 1 │
# │ 1 │
# │ 1 │
# └─────┘
But in Polars 0.13.15, this is an error
ValueError: Series constructor not called properly.
How do I fill a column with a value in polars?
Upvotes: 10
Views: 10912
Reputation: 9532
Starting in Polars 0.13.15, repeat
became a lazy function by default and lazy functions are not evaluated in the DataFrame
constructor. You can get the eager behavior back with the eager=True
flag:
import polars as pl
pl.DataFrame(dict(x=pl.repeat(1, 3, eager=True)))
Or you can use pl.select()
:
import polars as pl
pl.select(x=pl.repeat(1, 3))
Upvotes: 5
Reputation: 676
You might be looking for pl.lit(..)
import polars as pl
df = pl.DataFrame({"a": [1,2,3], "b": [4, 5, 6]})
print(df)
print(df.with_columns(pl.lit(1).alias("constant_column")))
This will give you the following output
shape: (3, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 4 │
│ 2 ┆ 5 │
│ 3 ┆ 6 │
└─────┴─────┘
shape: (3, 3)
┌─────┬─────┬─────────────────┐
│ a ┆ b ┆ constant_column │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i32 │
╞═════╪═════╪═════════════════╡
│ 1 ┆ 4 ┆ 1 │
│ 2 ┆ 5 ┆ 1 │
│ 3 ┆ 6 ┆ 1 │
└─────┴─────┴─────────────────┘
Upvotes: 16
Reputation: 65
Python lists can add elements dynamically. If you want to create a list where all elements have the same value, you can use the * operator.
# [1, 1, 1] = [1] * 3
# If it's a serial number from 1 to 3 ;
# [1, 2, 3] = list(range(1,4))
import polars as pl
df = pl.DataFrame({'X': [1] * 3})
Result:
shape: (3, 1)
┌─────┐
│ X │
│ --- │
│ i64 │
╞═════╡
│ 1 │
│ 1 │
│ 1 │
└─────┘
Upvotes: 2
Reputation: 2936
The docstring for pl.repeat (polars v0.20.0) is as follows:
pl.repeat(
value: 'IntoExpr | None',
n: 'int | Expr',
*,
dtype: 'PolarsDataType | None' = None,
eager: 'bool' = False,
) -> 'Expr | Series'
By default, it returns a lazy expression. To have it eagerly evaluate and return a series, you'll need to use it as
pl.repeat(1,2, eager=True)
as @drhagen already mentioned.
Expressions can also be run using pl.select, and converted to series as:
In[265]: pl.select(pl.repeat(1, 2)).to_series()
Out[265]:
shape: (2,)
Series: 'repeat' [i32]
[
1
1
]
pl.select runs the expressions in parallel, so you can simplify and speed up the process as
pl.DataFrame(pl.select(a=pl.repeat(1, 100), b=pl.repeat(1, 100)).to_dict())
which might be handy when you have a lot of expression evaluation goin on. To demonstrate it:
In [277]: %timeit pl.select(a=pl.repeat(1, 100, eager=True))
150 µs ± 6.2 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [278]: %timeit pl.select(a=pl.repeat(1, 100, eager=True), b=pl.repeat(1, 100, eager=True))
331 µs ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
In [279]: %timeit pl.select(a=pl.repeat(1, 100), b=pl.repeat(1, 100))
128 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [280]: %timeit pl.select(a=pl.repeat(1, 100))
104 µs ± 8.09 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [281]: %timeit pl.repeat(1, 100, eager=True)
99 µs ± 5.18 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [282]: %timeit pl.repeat(1, 100, eager=True); pl.repeat(1, 100, eager=True)
208 µs ± 14.4 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
Upvotes: 1