Dean MacGregor
Dean MacGregor

Reputation: 18446

Can we/how to conditionally select columns?

Let's say I have a list of dataframes list this:

Ldfs=[
     pl.DataFrame({'a':[1.0,2.0,3.1], 'b':[2,3,4]}), 
     pl.DataFrame({'b':[1,2,3], 'c':[2,3,4]}), 
     pl.DataFrame({'a':[1,2,3], 'c':[2,3,4]})
    ]

I can't do pl.concat(Ldfs) because they don't all have the same columns and even the ones that have a in common don't have the same data type.

What I'd like to do is concat them together but just add a column of Nones whenever a column isn't there and to cast columns to a fixed datatype.

For instance, just taking the first element of the list I'd like to have something like this work:

Ldfs[0].select(pl.when(pl.col('c')).then(pl.col('c')).otherwise(None).cast(pl.Float64).alias('c'))

of course, this results in ColumnNotFoundError: c

Upvotes: 2

Views: 687

Answers (1)

user20557510
user20557510

Reputation:

Would an approach like this work for you. (I'll convert your DataFrames to LazyFrames for added fun.)

Ldfs = [
    pl.DataFrame({"a": [1.0, 2.0, 3.1], "b": [2, 3, 4]}).lazy(),
    pl.DataFrame({"b": [1, 2, 3], "c": [2, 3, 4]}).lazy(),
    pl.DataFrame({"a": [1, 2, 3], "c": [2, 3, 4]}).lazy(),
]

my_schema = {
    "a": pl.Float64,
    "b": pl.Int64,
    "c": pl.UInt32,
}


def fix_schema(ldf: pl.LazyFrame) -> pl.LazyFrame:
    ldf = (
        ldf.with_columns(
            pl.col(col_nm).cast(col_type)
            for col_nm, col_type in my_schema.items()
            if col_nm in ldf.columns
        )
        .with_columns(
            pl.lit(None, dtype=col_type).alias(col_nm)
            for col_nm, col_type in my_schema.items()
            if col_nm not in ldf.columns
        )
        .select(my_schema.keys())
    )
    return ldf


pl.concat([fix_schema(next_frame)
          for next_frame in Ldfs], how="vertical").collect()
shape: (9, 3)
┌──────┬──────┬──────┐
│ a    ┆ b    ┆ c    │
│ ---  ┆ ---  ┆ ---  │
│ f64  ┆ i64  ┆ u32  │
╞══════╪══════╪══════╡
│ 1.0  ┆ 2    ┆ null │
│ 2.0  ┆ 3    ┆ null │
│ 3.1  ┆ 4    ┆ null │
│ null ┆ 1    ┆ 2    │
│ null ┆ 2    ┆ 3    │
│ null ┆ 3    ┆ 4    │
│ 1.0  ┆ null ┆ 2    │
│ 2.0  ┆ null ┆ 3    │
│ 3.0  ┆ null ┆ 4    │
└──────┴──────┴──────┘

Upvotes: 1

Related Questions