Reputation: 73
I have searched like crazy trying to find specifically how to read a row in a csv file.
I need to read a random row out of 1000, each of which has 3 columns. The first column has an email. I need to put in a random email, and get columns 2 and 3 out. (Python 2.7, csv file)
Example:
Name Date Color
Ray May Gray
Alex Apr Green
Ann Jun Blue
Kev Mar Gold
Rob May Black
Instead of column 1 row 3, I need [Ann], her whole row. This is a CSV file, with over 1000 names. I have to put in her name and output her whole row.
What I have tried
from collections import namedtuple
Entry = namedtuple('Entry', 'Name, Date, Color')
file_location = "C:/Users/abriman/Desktop/Book.csv"
ss_dict = {}
spreadsheet = file_location = "C:/Users/abriman/Desktop/Book.csv"
for row in spreadsheet:
entry = Entry(*tuple(row))
ss_dict['Ann']
And my error reads
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
TypeError: __new__() takes exactly 4 arguments (2 given)
I have tried other ways too and got little to no result. I'm a beginner at python.
Upvotes: 1
Views: 6553
Reputation: 3088
Solution to your problem could be simple dictionary comprehension:
>>> Entry = namedtuple('Entry', 'Name, Date, Color')
>>> [l for l in open('t.tsv', 'r')]
<<<
['Name Date Color\n',
'Ray May Gray\n',
'Alex Apr Green\n',
'Ann Jun Blue\n',
'Kev Mar Gold\n',
'Rob May Black\n']
>>> [l.split() for l in open('t.tsv', 'r')]
<<<
[['Name', 'Date', 'Color'],
['Ray', 'May', 'Gray'],
['Alex', 'Apr', 'Green'],
['Ann', 'Jun', 'Blue'],
['Kev', 'Mar', 'Gold'],
['Rob', 'May', 'Black']]
>>> [Entry(*l.split()) for l in open('t.tsv', 'r')]
<<<
[Entry(Name='Name', Date='Date', Color='Color'),
Entry(Name='Ray', Date='May', Color='Gray'),
Entry(Name='Alex', Date='Apr', Color='Green'),
Entry(Name='Ann', Date='Jun', Color='Blue'),
Entry(Name='Kev', Date='Mar', Color='Gold'),
Entry(Name='Rob', Date='May', Color='Black')] >>> {'fooo':e for e in Entry(*l.split()) for l in open('t.tsv', 'r')}
>>> {e.Name:e for e in list(Entry(*l.split()) for l in open('t.tsv', 'r'))}
<<<
{'Alex': Entry(Name='Alex', Date='Apr', Color='Green'),
'Ann': Entry(Name='Ann', Date='Jun', Color='Blue'),
'Kev': Entry(Name='Kev', Date='Mar', Color='Gold'),
'Name': Entry(Name='Name', Date='Date', Color='Color'),
'Ray': Entry(Name='Ray', Date='May', Color='Gray'),
'Rob': Entry(Name='Rob', Date='May', Color='Black')}
I think you are thinking on reading the first row as header names. Python has DictReader - https://docs.python.org/2/library/csv.html#csv.DictReader
>>> import csv
>>> for line in csv.DictReader(open('t.tsv')): print line # don't forget to make your file coma-separated.
{'Date': 'May', 'Color': 'Gray', 'Name': 'Ray'}
{'Date': 'Apr', 'Color': 'Green', 'Name': 'Alex'}
{'Date': 'Jun', 'Color': 'Blue', 'Name': 'Ann'}
{'Date': 'Mar', 'Color': 'Gold', 'Name': 'Kev'}
{'Date': 'May', 'Color': 'Black', 'Name': 'Rob'}
or with dictionary comprehension:
>>> { line['Name']: line for line in csv.DictReader(open('t.tsv')) }
<<<
{'Alex': {'Color': 'Green', 'Date': 'Apr', 'Name': 'Alex'},
'Ann': {'Color': 'Blue', 'Date': 'Jun', 'Name': 'Ann'},
'Kev': {'Color': 'Gold', 'Date': 'Mar', 'Name': 'Kev'},
'Ray': {'Color': 'Gray', 'Date': 'May', 'Name': 'Ray'},
'Rob': {'Color': 'Black', 'Date': 'May', 'Name': 'Rob'}}
>>> rows_by_name = { line['Name']: line for line in csv.DictReader(open('t.tsv')) }
>>> rows_by_name['Ann']
<<< {'Color': 'Blue', 'Date': 'Jun', 'Name': 'Ann'}
If you want random samples - i suggest first reading a rows into list and then make selection through randbom module. Or... let's do it with Entry:
>>> rows = list(Entry(*l.split()) for l in open('t.tsv', 'r'))
>>> import random
>>> random.sample(rows, 1)
<<< [Entry(Name='Ray', Date='May', Color='Gray')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Name', Date='Date', Color='Color')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 3)
<<<
[Entry(Name='Ray', Date='May', Color='Gray'),
Entry(Name='Kev', Date='Mar', Color='Gold'),
Entry(Name='Ann', Date='Jun', Color='Blue')]
>>> random.sample(rows, 3)
<<<
[Entry(Name='Ann', Date='Jun', Color='Blue'),
Entry(Name='Rob', Date='May', Color='Black'),
Entry(Name='Name', Date='Date', Color='Color')]
>>> random.sample(rows, 3)
<<<
[Entry(Name='Rob', Date='May', Color='Black'),
Entry(Name='Ann', Date='Jun', Color='Blue'),
Entry(Name='Kev', Date='Mar', Color='Gold')]
but beware, that you can load up your memory too much.
Upvotes: 1
Reputation: 10360
You're on the right track. First issue: you're never opening the file located at file_location
. Thus, when you iterate for row in spreadsheet:
, you're iterating over the characters of spreadsheet
, which are the characters of file_location
, which are the characters of "C:/Users/..."
. So the first thing you want to do is actually open the file:
spreadsheet = open(file_location, 'r')
You still have another issue in your loop. When you iterate over a file in a for
loop, you get back the lines of the file. So, at each iteration, row
will be a line, e.g. "Ray May Gray"
. When you call tuple()
on that, you're going to get a tuple that looks like ('R', 'a', 'y', ' ', ' ', 'M', ...)
. What you need to do is construct your tuple by splitting on whitespace:
entry = Entry(*row.split())
Then, you need to add your entry to the dictionary ss_dict
:
ss_dict[entry.Name] = entry
Finally, you can read out the value of ss_dict['Ann']
, but this should be outside your loop - if you do it inside your loop, you may be trying to read the value of ss_dict['Ann']
before it has been set. All in all, your code should look like this:
from collections import namedtuple
Entry = namedtuple('Entry', 'Name, Date, Color')
file_location = "C:/Users/abriman/Desktop/Book.csv"
ss_dict = {}
spreadsheet = open(file_location, 'r') # <--
for row in spreadsheet:
entry = Entry(*row.split()) # <--
ss_dict[entry.Name] = entry # <--
print ss_dict['Ann']
Incidentally, the reason you're getting your error message there is that when you do for row in spreadsheet:
with spreadsheet
being a string, row
is just a character, as I mentioned, and so tuple(row)
is just a tuple containing one character, and hence is of length 1, so that you're only passing one argument rather than three when you do *tuple(row)
.
All that said, you might want to consider looking at the csv
module, which is part of the standard library, and is precisely designed for reading csv files. It will probably make your life easier in the long run.
Upvotes: 4
Reputation: 359
I think what you need is enumerate
def read_csv_line(line_number, filename):
with open("filename.csv") as fileobj
for i, line in enumerate(fileobj):
if i == (line_number - 1):
return line
return None
Then you can feed your random number and filename to get a random line.
Upvotes: 3