Gogo78
Gogo78

Reputation: 387

Skipping header in CSV file

I'm adding data from csv file using lambda function the data is added but there's an error in my table in dynamodb I see my headers also a row in table here's my code :

import boto3
s3=boto3.client("s3")

dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('maysales')

def lambda_handler(event, context):
    bucketna=event['Records'][0]['s3']['bucket']['name']
    s3_name=event['Records'][0]['s3']['object']['key']
    response=s3.get_object(Bucket=bucketn,Key=s3_name)
    data=response['Body'].read().decode("utf-8")
    salesnbs=data.split("\n")
    for ko in salesnbs:
        kos=ko.split(",")
        table.put_item(
            Item = { 
            "Date": kos[0],
            "name": kos[1],
            "fam": kos[2],
            "locati": kos[3],
            "adress": kos[4],
            "country": kos[5],
            "city": kos[6]
        })

my table contains headers already:

Upvotes: 0

Views: 839

Answers (4)

Sridhar Rajaram
Sridhar Rajaram

Reputation: 91

Now from below modified code of @E.J. Brennan, we can able to skip header while pushing csv file from s3 into dynamodb. The below piece of code to replaced into your lambda function.

import boto3
s3_client = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('yourdynamodbtablename')
def lambda_handler(event, context):
    bucket = event['Records'][0]['s3']['bucket']['name']
    s3_file_name = event['Records'][0]['s3']['object']['key']
    response = s3_client.get_object(Bucket=bucket,Key=s3_file_name)
    fileData = response['Body'].read().decode("utf-8")
    print(fileData)
    modelData = fileData.split("\n")
    header = 0
    for row in modelData:
        print(row)
        if header == 0:
            header = header+1
            continue
        row_data = row.split(",")
        try:
            table.put_item(
                Item = {
                    'ID': row_data[0],
                    'NAME': row_data[1],
                    'SUBJECT': row_data[2]
                }
            )
        except Exceptions as e:
            print('End of File')
    return 'news rows were inserted successful without header into db'

Upvotes: 0

alexis-donoghue
alexis-donoghue

Reputation: 3387

It's not entirely clear what's the problem from description, but I suggest using Python's built-in module csv to handle CSV data. This way you won't need to worry about headers and splitting file into columns, since module provides tools for that.

import csv
...

# Here you can also specify delimiter if need be
reader = csv.DictReader(response['Body'])
for row in reader:
    table.put_item(
            Item = { 
            "Date": row["Date"],
            "name": row["name"],
            "fam": row["fam"],
            ...
        })

Module uses first row of the file for column names.

Upvotes: 1

Tanmay Singhal
Tanmay Singhal

Reputation: 31

Use boto3 client instead of resource. Install dynamodb-json

from dynamodb_json import json_util as dynamo_json
import json
import boto3
s3=boto3.client("s3")

dynamodb = boto3.client('dynamodb')

def lambda_handler(event, context):
    bucketna=event['Records'][0]['s3']['bucket']['name']
    s3_name=event['Records'][0]['s3']['object']['key']
    response=s3.get_object(Bucket=bucketn,Key=s3_name)
    data=response['Body'].read().decode("utf-8")
    salesnbs=data.split("\n")
    for ko in salesnbs:
        kos=ko.split(",")
        data = { 
            "Date": kos[0],
            "name": kos[1],
            "fam": kos[2],
            "locati": kos[3],
            "adress": kos[4],
            "country": kos[5],
            "city": kos[6]
        }
       client.put_item(TableName='maysales',Item=json.loads(dynamo_json.dumps(data)))

Upvotes: -1

E.J. Brennan
E.J. Brennan

Reputation: 46859

The first row of most CSV files contain the header labels, if you don't want to add that row to your dynamodb table, you need to skip past that first row before you start doing your insertions, i.e:

row = 0
for ko in salesnbs:
    if row == 0:
       continue # don't process this line

    row = row + 1
    kos=ko.split(",")
    table.put_item(
        Item = { 
        "Date": kos[0],
        "name": kos[1],
        "fam": kos[2],
        "locati": kos[3],
        "adress": kos[4],
        "country": kos[5],
        "city": kos[6]
    })

(syntax might not be 100% correct, but that is the idea)

Upvotes: 1

Related Questions