DataPsycho
DataPsycho

Reputation: 988

Can not read csv into Polars dataframe in Rust with LazyCsvReader

I was trying rust version of polars for the first time. So I have set up a project and added polars into the cargo.toml file the cargo file looks as follows:

[package]
name = "polar_test"
version = "0.1.0"
edition = "2021"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
polars = "0.22.6"

Then I have written the following code in my main.rs file. The code is taken directly Polars website. But the compiler is complaining about the LazyCsvReader and other types like col, sort etc. It looks like the ::prelude::* has no effect. Here is the code for the main.rs file:

use polars::prelude::*;

fn example() -> Result<DataFrame> {
    LazyCsvReader::new("wine.data".into()).collect()
}

fn main() {
    println!("Hello, world!");
}

Here is the error Logs:

datapsycho@dataops:~/.../PolarTest$ cargo build
   Compiling polar_test v0.1.0 (/home/datapsycho/IdeaProjects/PolarTest)
error[E0433]: failed to resolve: use of undeclared type `LazyCsvReader`
 --> src/main.rs:4:5
  |
4 |     LazyCsvReader::new("foo.csv".into()).collect()
  |     ^^^^^^^^^^^^^ use of undeclared type `LazyCsvReader`

For more information about this error, try `rustc --explain E0433`.
error: could not compile `polar_test` due to previous error

My understanding is using prelude::* does not bring the types like col, groupby, LazyCsvReader, into the scopes. Can someone give me an example how can I read a CSV file with polars and do some operation. Here is the corresponding python version of the code with pandas looks as follows:

from pathlib import Path
import pandas as pd


def read_data(path: Path) -> pd.DataFrame:
    columns = [
        "Class label", "Alcohol",
        "Malic acid", "Ash",
        "Alcalinity of ash", "Magnesium",
        "Total phenols", "Flavanoids",
        "Nonflavanoid phenols",
        "Proanthocyanins",
        "Color intensity", "Hue",
        "OD280/OD315 of diluted wines",
        "Proline"
    ]
    _df = pd.read_csv(path, names=columns)
    return _df


def count_classes(df: pd.DataFrame) -> pd.DataFrame:
    _df = df.groupby("Class label").agg(total=("Class label", "count")).reset_index()
    _df.to_csv(Path("datastore").joinpath("data_count.csv"), index=False)
    return _df


def main():
    file_path = Path("datastore").joinpath("wine.data")
    main_df = read_data(file_path)
    class_stat_df = count_classes(main_df)
    print(class_stat_df)


if __name__ == "__main__":
    main()

The data can be downloaded from the following command:

wget https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data

Can someone help me, how I can write the same transformation pipeline in Rust with polars. This is the example given in Polars's front page which might need some modification:

use polars::prelude::*;

fn example() -> Result<DataFrame> {
    LazyCsvReader::new("foo.csv".into())
        .finish()
        .filter(col("bar").gt(lit(100)))
        .groupby(vec![col("ham")])
        .agg(vec![col("spam").sum(), col("ham").sort(false).first()])
        .collect()
}

Upvotes: 1

Views: 2379

Answers (2)

DataPsycho
DataPsycho

Reputation: 988

Finally got the solution after reading the suggested doc. Being an beginner Rust user could not really understand the solution has been suggested first. Then after some study, things are much clear now to me. So my understanding is to reduce the compilation over head certain features are disabled in the default mode and the user need to enable those features. To enable those feature the cargo.toml file needed to be updated. Such as to bring describe and lazy feature the following configuration can be used in the cargo file:

polars = {version = "0.22.8", features = ["describe", "lazy"]}

Upvotes: 3

ritchie46
ritchie46

Reputation: 14630

You need to activate the lazy feature. See the docs for all features:

https://docs.rs/polars/0.22.8/polars/#compile-times-and-opt-in-features

Upvotes: 5

Related Questions