tuxian
tuxian

Reputation: 178

Read lines between ranges from s3 file

I have a s3 csv file in aws-s3 bucket. I want to read only ranges of lines from the file instead of whole file as it is.

My goal here is, to get the content of the file in paginated way in the UI application as a tabulated form.

I read that Range option is present in s3.getObject from this link.

But I do not need byte ranges but I need line ranges. Ex: (1-100, similar to pagination)

Is there a way that we can do this using NodeJS?

Upvotes: 2

Views: 4467

Answers (4)

Krittika Pandey
Krittika Pandey

Reputation: 11

S3 select_object_content does not support row_number() function. If you have any unique/range column in your CSV use that column to perform select query to get lines/rows for particular range. But never forget to cast range column into integer in your query otherwise it will consider as string.

import boto3
S3_BUCKET = 'bucket-name'

s3 = boto3.client('s3')

r = s3.select_object_content(
        Bucket=S3_BUCKET,
        Key='filename.csv',
        ExpressionType='SQL',
        Expression="select * from s3object s where CAST(row_index as INT) >= 1 and CAST(row_index as INT) <= 100",
        InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},
        OutputSerialization={'CSV': {}},
)

for event in r['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        print(records)

Upvotes: 1

swarnim gupta
swarnim gupta

Reputation: 231

No s3 does not support this currently .

You will have to extract the whole file and then you can perform the filters over it using fast-csv module.

You can use skip-rows and max-rows of this module to form a range .

https://c2fo.io/fast-csv/docs/parsing/options

Upvotes: 0

swarnim gupta
swarnim gupta

Reputation: 231

You can convert your lines into their respective bytes . There is an npm module that converts to bytes

Upvotes: -1

deepanmurugan
deepanmurugan

Reputation: 2113

You can either use S3 select (add a new column row_index to limit the number of rows) or Athena to query the CSV files.

S3 select program to print 1000 rows from a CSV file

import boto3
S3_BUCKET = 'bucket-name'

s3 = boto3.client('s3')

r = s3.select_object_content(
        Bucket=S3_BUCKET,
        Key='filename.csv',
        ExpressionType='SQL',
        Expression="select \"column\" from s3object s where row_index >= 1 and row_index <= 1000,
        InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},
        OutputSerialization={'CSV': {}},
)

for event in r['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        print(records)

Using Athena

You can simply upload a CSV file in Athena. It's pretty straight forward to upload files to Athena. Then you can query limited set of data from select query.

SELECT * FROM table limit 1000;

Upvotes: 3

Related Questions