user2966197
user2966197

Reputation: 2981

issue in accessing specific columns of a csv file read as a S3 object with boto3

I am reading a csv file from S3 using boto3 and want to access specific columns of that csv. I have this code where I read the csv file into a S3 object using boto3 but I am having trouble in accessing specific columns out of it:

import boto3

s3 = boto3.resource('s3',aws_access_key_id = keyId, aws_secret_access_key = sKeyId)

obj = s3.Object(bucketName, srcFileName)

filedata = obj.get()["Body"].read()
print(filedata.decode('utf8'))

for row in filedata.decode('utf8'):
    print(row[1]) # Get the column at index 1

When I execute this above the print(filedata.decode('utf8')) prints following on my output console:

51350612,Gary Scott
10100063,Justin Smith
10100162,Annie Smith
10100175,Lisa Shaw
10100461,Ricardo Taylor
10100874,Ricky Boyd
10103593,Hyman Cordero

But the line print(row[1]) inside for loop throws error as IndexError: string index out of range.

How can I remove this error and access specific columns out of a csv file from S3 using `boto3?

Upvotes: 1

Views: 2218

Answers (2)

mootmoot
mootmoot

Reputation: 13166

boto3.s3.get().read() will retrieve the whole file bytes object. Your code filedata.decode('utf8') only convert the whole bytes object into String object. There is no parsing happen here. Here is a shameless copy from another answer from another answer.

import csv 
# ...... code snipped .... insert your boto3 code here

# Parse your file correctly 
lines = response[u'Body'].read().splitlines()
# now iterate over those lines
for row in csv.DictReader(lines):
    # here you get a sequence of dicts
    # do whatever you want with each line here
    print(row)

If you just have a simple CSV file, a quick and dirty fix will do

for row in filedata.decode('utf8').splitlines():
    items = row.split(',')
    print(items[0]. items[1])

How do I read a csv stored in S3 with csv.DictReader?

Upvotes: 2

Kyle Finley
Kyle Finley

Reputation: 56

To read from the CSV properly, import the CSV python module and use one of its readers.

Documentation: https://docs.python.org/2/library/csv.html

Upvotes: 0

Related Questions