8mrsteel
8mrsteel

Reputation: 57

How can I use "\s+" as a seperator in polars?

So I have a large file of data that's can be up to 11 columns wide, it looks something like this.

1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1

When I read in using pandas i used the code: pd.read_csv(file_dir, skiprows = 1, sep = '\s+'). When pandas reads it, it creates a dataframe that's 3x8(Which is correct). Using Polars: pl.read_csv(file_dir, skip_rows=1, orient='col') When polars reads it, it creates a data frame that's 3x1. I think its due to the separator, but I'm not sure. I tried using "\s+" in polars but it doesn't like it because its bigger than one byte. The delimiter in these files in 7 white spaces.

Upvotes: 3

Views: 2657

Answers (1)

user18559875
user18559875

Reputation:

From the documentation for read_csv and scan_csv, the delimiter must be a single-byte character. As such, you'll need to preprocess your file to convert the 7-whitespace delimiter to a single-byte delimiter before reading it with Polars.

The single-byte limitation is done for performance reasons. For some further background, you can look at this discussion on GitHub. The discussion includes some suggestions for picking single-byte delimiters that are not likely to be found in your input file.

Upvotes: 3

Related Questions