Reputation: 178
I have a s3 csv file in aws-s3 bucket. I want to read only ranges of lines from the file instead of whole file as it is.
My goal here is, to get the content of the file in paginated way in the UI application as a tabulated form.
I read that Range
option is present in s3.getObject from this link.
But I do not need byte ranges but I need line ranges. Ex: (1-100, similar to pagination)
Is there a way that we can do this using NodeJS?
Upvotes: 2
Views: 4467
Reputation: 11
S3 select_object_content does not support row_number() function. If you have any unique/range column in your CSV use that column to perform select query to get lines/rows for particular range. But never forget to cast range column into integer in your query otherwise it will consider as string.
import boto3
S3_BUCKET = 'bucket-name'
s3 = boto3.client('s3')
r = s3.select_object_content(
Bucket=S3_BUCKET,
Key='filename.csv',
ExpressionType='SQL',
Expression="select * from s3object s where CAST(row_index as INT) >= 1 and CAST(row_index as INT) <= 100",
InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},
OutputSerialization={'CSV': {}},
)
for event in r['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
print(records)
Upvotes: 1
Reputation: 231
No s3 does not support this currently .
You will have to extract the whole file and then you can perform the filters over it using fast-csv module.
You can use skip-rows and max-rows of this module to form a range .
https://c2fo.io/fast-csv/docs/parsing/options
Upvotes: 0
Reputation: 231
You can convert your lines into their respective bytes . There is an npm module that converts to bytes
Upvotes: -1
Reputation: 2113
You can either use S3 select (add a new column row_index to limit the number of rows) or Athena to query the CSV files.
S3 select program to print 1000 rows from a CSV file
import boto3
S3_BUCKET = 'bucket-name'
s3 = boto3.client('s3')
r = s3.select_object_content(
Bucket=S3_BUCKET,
Key='filename.csv',
ExpressionType='SQL',
Expression="select \"column\" from s3object s where row_index >= 1 and row_index <= 1000,
InputSerialization={'CSV': {"FileHeaderInfo": "Use"}},
OutputSerialization={'CSV': {}},
)
for event in r['Payload']:
if 'Records' in event:
records = event['Records']['Payload'].decode('utf-8')
print(records)
Using Athena
You can simply upload a CSV file in Athena. It's pretty straight forward to upload files to Athena. Then you can query limited set of data from select query.
SELECT * FROM table limit 1000;
Upvotes: 3