sloth
sloth

Reputation: 87

extracting row from CSV file with Python / Django

hey I'm trying to extract certain row from a CSV file with content in this form:

POS,Transaction id,Product,Quantity,Customer,Date
1,E100,TV,1,Test Customer,2022-09-19
2,E100,Laptop,3,Test Customer,2022-09-20
3,E200,TV,1,Test Customer,2022-09-21
4,E300,Smartphone,2,Test Customer,2022-09-22
5,E300,Laptop,5,New Customer,2022-09-23
6,E300,TV,1,New Customer,2022-09-23
7,E400,TV,2,ABC,2022-09-24
8,E500,Smartwatch,4,ABC,2022-09-25

the code I wrote is the following

def csv_upload_view(request):
    print('file is being uploaded')

    if request.method == 'POST':
        csv_file = request.FILES.get('file')
        obj = CSV.objects.create(file_name=csv_file)

        with open(obj.file_name.path, 'r') as f:
            reader = csv.reader(f)
            reader.__next__()
            for  row in reader:
                data = "".join(row)
                data = data.split(";")
                #data.pop()
                print(data[0], type(data))
                transaction_id = data[0]
                product = data[1]
                quantity = int(data[2])
                customer = data[3]
                date = parse_date(data[4])

In the console then I get the following output:

Quit the server with CONTROL-C.
[22/Sep/2022 15:16:28] "GET /reports/from-file/ HTTP/1.1" 200 11719
file is being uploaded
1E100TV1Test Customer2022-09-19 <class 'list'>

So that I get the correct row put everything concatenated. If instead I put in a space in the " ".join.row I get the entire row separated with empty spaces - what I would like to do is access this row with

transaction_id = data[0]
                product = data[1]
                quantity = int(data[2])
                customer = data[3]
                date = parse_date(data[4])

but I always get an

IndexError: list index out of range

I also tried with data.replace(" ",";") but this gives me another error and the data type becomes a string instead of a list:

ValueError: invalid literal for int() with base 10: 'E'

Can someone please show me what I'm missing here?

Upvotes: 0

Views: 309

Answers (2)

AirSquid
AirSquid

Reputation: 11903

I'm not sure why you are joining/splitting the row up. And you realize your split is using a semicolon?

I would expect something like this:

import csv
from collections import namedtuple

Transaction = namedtuple('Transaction', ['id', 'product', 'qty', 'customer', 'date'])

f_name = 'data.csv'
transactions = []  # to hold the result
with open(f_name, 'r') as src:
    src.readline()  # burn the header row
    reader = csv.reader(src)   # if you want to use csv reader
    for data in reader:
        #print(data)  <-- to see what the csv reader gives you...
        t = Transaction(data[1], data[2], int(data[3]), data[4], data[5])
        transactions.append(t)

for t in transactions:
    print(t)

The above "catches" results with a namedtuple, which is obviously optional. You could put them in lists, etc.

Also csv.reader will do the splitting (by comma) by default. I edited my previous answer.

As far as your question goes... You mention extracting a "certain row" but you gave no indication how you would find such row. If you know the row index/number, you could burn lines with readline or such, or just keep a counter while you read. If you are looking for keyword in the data, just pop a conditional statement in either before or after splitting up the line.

Upvotes: 2

Raphael Frei
Raphael Frei

Reputation: 406

This way you can split the rows (and find which row you want based on some provided value)

with open('data.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter = ',')

    line_count = 0

    for row in csv_reader:
        # Line 0 is the header

        if line_count == 0:
            print(f'Column names are {", ".join(row)}')
            line_count += 1
        
        else:
            line_count += 1
            # Here you can check if the row value is equal what you're finding
            # row[0] = POS
            # row[1] = Transaction id
            # row[2] = Product
            # row[3] = Quantity
            # row[4] = Customer
            # row[5] = Date

            if row[2] = "TV":
                #If you want to add all variables into a single string:
                data = ",".join(row) 

                # Make each row into a single variable:
                transaction_id = row[0]
                product = row[1]
                quantity = row[2]
                customer = row[3]
                date = row[4]

Upvotes: 1

Related Questions