Reputation: 57
So I have a large file of data that's can be up to 11 columns wide, it looks something like this.
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
When I read in using pandas i used the code:
pd.read_csv(file_dir, skiprows = 1, sep = '\s+')
.
When pandas reads it, it creates a dataframe that's 3x8(Which is correct).
Using Polars:
pl.read_csv(file_dir, skip_rows=1, orient='col')
When polars reads it, it creates a data frame that's 3x1.
I think its due to the separator, but I'm not sure. I tried using "\s+" in polars but it doesn't like it because its bigger than one byte. The delimiter in these files in 7 white spaces.
Upvotes: 3
Views: 2657
Reputation:
From the documentation for read_csv
and scan_csv
, the delimiter must be a single-byte character. As such, you'll need to preprocess your file to convert the 7-whitespace delimiter to a single-byte delimiter before reading it with Polars.
The single-byte limitation is done for performance reasons. For some further background, you can look at this discussion on GitHub. The discussion includes some suggestions for picking single-byte delimiters that are not likely to be found in your input file.
Upvotes: 3