Reputation: 2981
I am reading a csv
file from S3
using boto3
and want to access specific columns of that csv
. I have this code where I read the csv
file into a S3
object using boto3
but I am having trouble in accessing specific columns out of it:
import boto3
s3 = boto3.resource('s3',aws_access_key_id = keyId, aws_secret_access_key = sKeyId)
obj = s3.Object(bucketName, srcFileName)
filedata = obj.get()["Body"].read()
print(filedata.decode('utf8'))
for row in filedata.decode('utf8'):
print(row[1]) # Get the column at index 1
When I execute this above the print(filedata.decode('utf8'))
prints following on my output console:
51350612,Gary Scott
10100063,Justin Smith
10100162,Annie Smith
10100175,Lisa Shaw
10100461,Ricardo Taylor
10100874,Ricky Boyd
10103593,Hyman Cordero
But the line print(row[1])
inside for
loop throws error as IndexError: string index out of range
.
How can I remove this error and access specific columns out of a csv file from S3
using `boto3?
Upvotes: 1
Views: 2218
Reputation: 13166
boto3.s3.get().read() will retrieve the whole file bytes object. Your code filedata.decode('utf8')
only convert the whole bytes object into String object. There is no parsing happen here. Here is a shameless copy from another answer from another answer.
import csv
# ...... code snipped .... insert your boto3 code here
# Parse your file correctly
lines = response[u'Body'].read().splitlines()
# now iterate over those lines
for row in csv.DictReader(lines):
# here you get a sequence of dicts
# do whatever you want with each line here
print(row)
If you just have a simple CSV file, a quick and dirty fix will do
for row in filedata.decode('utf8').splitlines():
items = row.split(',')
print(items[0]. items[1])
How do I read a csv stored in S3 with csv.DictReader?
Upvotes: 2
Reputation: 56
To read from the CSV properly, import the CSV python module and use one of its readers.
Documentation: https://docs.python.org/2/library/csv.html
Upvotes: 0