daviewales
daviewales

Reputation: 2719

Polars: Specify dtypes for all columns at once in read_csv

In Polars, how can one specify a single dtype for all columns in read_csv?

According to the docs, the schema_overrides argument to read_csv can take either a mapping (dict) in the form of {'column_name': dtype}, or a list of dtypes, one for each column. However, it is not clear how to specify "I want all columns to be a single dtype".

If you wanted all columns to be String for example and you knew the total number of columns, you could do:

pl.read_csv('sample.csv', schema_overrides=[pl.String]*number_of_columns)

However, this doesn't work if you don't know the total number of columns. In Pandas, you could do something like:

pd.read_csv('sample.csv', dtype=str)

But this doesn't work in Polars.

Upvotes: 15

Views: 26049

Answers (2)

Cornelius Roemer
Cornelius Roemer

Reputation: 8286

If you want to read all columns as str (pl.String in polars) set infer_schema=False as polars uses string as default type when reading csvs.

pl.read_csv('sample.csv', infer_schema=False)

This is the TLDR of ritchie46's more detailed answer. I broke it out into a separate answer as his code snippet solves the general case for any datatype and not the special but common case of reading all as strings.

Upvotes: 7

ritchie46
ritchie46

Reputation: 14730

Reading all data in a csv to any other type than pl.String likely fails with a lot of null values. We can use expressions to declare how we want to deal with those null values.

If you read a csv with infer_schema_length=0, polars does not know the schema and will read all columns as pl.String as that is a super type of all polars types.

When read as String we can use expressions to cast all columns.

(pl.read_csv("test.csv", infer_schema_length=0)
   .with_columns(pl.all().cast(pl.Int32, strict=False))

Update: infer_schema=False was added in 1.2.0 as a more user-friendly name for this feature.

pl.read_csv("test.csv", infer_schema=False) # read all as pl.String

Upvotes: 21

Related Questions