khinester
khinester

Reputation: 3520

How to read CSV file from S3

I have the following code:

package main

import (
    "encoding/csv"
    "fmt"
    "io/ioutil"
    "path"

    "github.com/aws/aws-lambda-go/events"
    "github.com/aws/aws-lambda-go/lambda"
    "github.com/aws/aws-sdk-go/aws"
    "github.com/aws/aws-sdk-go/aws/session"
    "github.com/aws/aws-sdk-go/service/s3"
    "github.com/aws/aws-sdk-go/service/s3/s3iface"
)

var (
    // TOKEN = os.Getenv("TOKEN")
    svc s3iface.S3API
)

func main() {    
    // Panics if there is an error in creating session
    svc = s3iface.S3API(s3.New(session.Must(session.NewSession())))

    lambda.Start(Handler)
}

func Handler(evt events.S3Event) error {

    for _, rec := range evt.Records {

        key := rec.S3.Object.Key

        dir, file := path.Split(key)
        // Download the file from S3
        obj, err := svc.GetObject(&s3.GetObjectInput{
            Bucket: aws.String(rec.S3.Bucket.Name),
            Key:    aws.String(key),
        })
        if err != nil {
            return fmt.Errorf("error in downloading %s from S3: %s\n", key, err)
        }

        body, err := ioutil.ReadAll(obj.Body)
        if err != nil {
            return fmt.Errorf("error in reading file %s: %s\n", key, err)
        }

        reader := csv.NewReader(body)
        record, err := reader.ReadAll()
        if err != nil {
            fmt.Println("Error", err)
        }

        for value := range record { // for i:=0; i<len(record)
            fmt.Println("", record[value])
        }
    }
    return nil
}

I am trying the parse the CSV file from S3, do something with each row, but I am getting

cannot use body (type []byte) as type io.Reader in argument to csv.NewReader:
    []byte does not implement io.Reader (missing Read method)

Any advice is much appreciated

Upvotes: 2

Views: 8441

Answers (1)

Himanshu
Himanshu

Reputation: 12675

As the error says:

cannot use body (type []byte) as type io.Reader in argument to csv.NewReader: []byte does not implement io.Reader (missing Read method)

because you are passing []byte returned from the response to csv.NewReader

It is required to implement io.Reader by body to pass it as an argument to csv.NewReader. Since it takes io.Reader as an argument. Try to change your code as:

reader := csv.NewReader(bytes.NewBuffer(body))
record, err := reader.ReadAll()
if err != nil {
    fmt.Println("Error", err)
}

Also Since aws.GetObject returns pointer to GetObjectOutput struct.

func (c *S3) GetObject(input *GetObjectInput) (*GetObjectOutput, error)

which itself implements Reader

type GetObjectOutput struct {
    ....
    // Object data.
    Body io.ReadCloser `type:"blob"`
    ....
}

so you can pass the returned body directly to csv.NewReader.

One more thing is you can go for Download Manager

The s3manager package's Downloader provides concurrently downloading of Objects from S3. The Downloader will write S3 Object content with an io.WriterAt. Once the Downloader instance is created you can call Download concurrently from multiple goroutines safely.

func (d Downloader) Download(w io.WriterAt, input *s3.GetObjectInput, options ...func(*Downloader)) (n int64, err error)

Download downloads an object in S3 and writes the payload into w using concurrent GET requests.

It is safe to call this method concurrently across goroutines.

// The session the S3 Downloader will use
sess := session.Must(session.NewSession())

// Create a downloader with the session and default options
downloader := s3manager.NewDownloader(sess)

// Create a file to write the S3 Object contents to.
f, err := os.Create(filename)
if err != nil {
    return fmt.Errorf("failed to create file %q, %v", filename, err)
}

// Write the contents of S3 Object to the file
n, err := downloader.Download(f, &s3.GetObjectInput{
    Bucket: aws.String(myBucket),
    Key:    aws.String(myString),
})
if err != nil {
    return fmt.Errorf("failed to download file, %v", err)
}

Upvotes: 6

Related Questions