Reputation: 33
I am trying to read a large compressed CSV file from AWS S3 and convert it to a Panda data frame in Sagemaker. Is there any direct and clean approach to do it?
Upvotes: 2
Views: 2985
Reputation: 48256
You can use the AWS Wrangler library to do so, easily
It supports GZIP compression, and will read the CSV directly into a Pandas dataframe
(pip install awswranger)
import awswrangler as wr
df = wr.s3.read_csv(path="s3://bucket/path/to/my.csv.gzip")
Upvotes: 2