Reputation: 87
hey I'm trying to extract certain row from a CSV file with content in this form:
POS,Transaction id,Product,Quantity,Customer,Date
1,E100,TV,1,Test Customer,2022-09-19
2,E100,Laptop,3,Test Customer,2022-09-20
3,E200,TV,1,Test Customer,2022-09-21
4,E300,Smartphone,2,Test Customer,2022-09-22
5,E300,Laptop,5,New Customer,2022-09-23
6,E300,TV,1,New Customer,2022-09-23
7,E400,TV,2,ABC,2022-09-24
8,E500,Smartwatch,4,ABC,2022-09-25
the code I wrote is the following
def csv_upload_view(request):
print('file is being uploaded')
if request.method == 'POST':
csv_file = request.FILES.get('file')
obj = CSV.objects.create(file_name=csv_file)
with open(obj.file_name.path, 'r') as f:
reader = csv.reader(f)
reader.__next__()
for row in reader:
data = "".join(row)
data = data.split(";")
#data.pop()
print(data[0], type(data))
transaction_id = data[0]
product = data[1]
quantity = int(data[2])
customer = data[3]
date = parse_date(data[4])
In the console then I get the following output:
Quit the server with CONTROL-C.
[22/Sep/2022 15:16:28] "GET /reports/from-file/ HTTP/1.1" 200 11719
file is being uploaded
1E100TV1Test Customer2022-09-19 <class 'list'>
So that I get the correct row put everything concatenated. If instead I put in a space in the " ".join.row I get the entire row separated with empty spaces - what I would like to do is access this row with
transaction_id = data[0]
product = data[1]
quantity = int(data[2])
customer = data[3]
date = parse_date(data[4])
but I always get an
IndexError: list index out of range
I also tried with data.replace(" ",";") but this gives me another error and the data type becomes a string instead of a list:
ValueError: invalid literal for int() with base 10: 'E'
Can someone please show me what I'm missing here?
Upvotes: 0
Views: 309
Reputation: 11903
I'm not sure why you are joining/splitting the row up. And you realize your split is using a semicolon?
I would expect something like this:
import csv
from collections import namedtuple
Transaction = namedtuple('Transaction', ['id', 'product', 'qty', 'customer', 'date'])
f_name = 'data.csv'
transactions = [] # to hold the result
with open(f_name, 'r') as src:
src.readline() # burn the header row
reader = csv.reader(src) # if you want to use csv reader
for data in reader:
#print(data) <-- to see what the csv reader gives you...
t = Transaction(data[1], data[2], int(data[3]), data[4], data[5])
transactions.append(t)
for t in transactions:
print(t)
The above "catches" results with a namedtuple
, which is obviously optional. You could put them in lists, etc.
Also csv.reader
will do the splitting (by comma) by default. I edited my previous answer.
As far as your question goes... You mention extracting a "certain row" but you gave no indication how you would find such row. If you know the row index/number, you could burn lines with readline
or such, or just keep a counter while you read. If you are looking for keyword in the data, just pop a conditional statement in either before or after splitting up the line.
Upvotes: 2
Reputation: 406
This way you can split the rows (and find which row you want based on some provided value)
with open('data.csv') as csv_file:
csv_reader = csv.reader(csv_file, delimiter = ',')
line_count = 0
for row in csv_reader:
# Line 0 is the header
if line_count == 0:
print(f'Column names are {", ".join(row)}')
line_count += 1
else:
line_count += 1
# Here you can check if the row value is equal what you're finding
# row[0] = POS
# row[1] = Transaction id
# row[2] = Product
# row[3] = Quantity
# row[4] = Customer
# row[5] = Date
if row[2] = "TV":
#If you want to add all variables into a single string:
data = ",".join(row)
# Make each row into a single variable:
transaction_id = row[0]
product = row[1]
quantity = row[2]
customer = row[3]
date = row[4]
Upvotes: 1