Reputation: 73

How to read a specific row of a csv file in python?

I have searched like crazy trying to find specifically how to read a row in a csv file.

I need to read a random row out of 1000, each of which has 3 columns. The first column has an email. I need to put in a random email, and get columns 2 and 3 out. (Python 2.7, csv file)

Example:

Name Date  Color
Ray  May   Gray
Alex Apr   Green
Ann  Jun   Blue
Kev  Mar   Gold
Rob  May   Black

Instead of column 1 row 3, I need [Ann], her whole row. This is a CSV file, with over 1000 names. I have to put in her name and output her whole row.

What I have tried

from collections import namedtuple
Entry = namedtuple('Entry', 'Name, Date, Color')
file_location = "C:/Users/abriman/Desktop/Book.csv"
ss_dict = {}
spreadsheet = file_location = "C:/Users/abriman/Desktop/Book.csv"
for row in spreadsheet:
    entry = Entry(*tuple(row))
    ss_dict['Ann']

And my error reads

Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
TypeError: __new__() takes exactly 4 arguments (2 given)

I have tried other ways too and got little to no result. I'm a beginner at python.

Upvotes: 1

Answers (3)

nefo_x

Reputation: 3088

Solution to your problem could be simple dictionary comprehension:

>>> Entry = namedtuple('Entry', 'Name, Date, Color')
>>> [l for l in open('t.tsv', 'r')]
<<<
['Name Date  Color\n',
 'Ray  May   Gray\n',
 'Alex Apr   Green\n',
 'Ann  Jun   Blue\n',
 'Kev  Mar   Gold\n',
 'Rob  May   Black\n']
>>> [l.split() for l in open('t.tsv', 'r')]
<<<
[['Name', 'Date', 'Color'],
 ['Ray', 'May', 'Gray'],
 ['Alex', 'Apr', 'Green'],
 ['Ann', 'Jun', 'Blue'],
 ['Kev', 'Mar', 'Gold'],
 ['Rob', 'May', 'Black']]
>>> [Entry(*l.split()) for l in open('t.tsv', 'r')]
<<<
[Entry(Name='Name', Date='Date', Color='Color'),
 Entry(Name='Ray', Date='May', Color='Gray'),
 Entry(Name='Alex', Date='Apr', Color='Green'),
 Entry(Name='Ann', Date='Jun', Color='Blue'),
 Entry(Name='Kev', Date='Mar', Color='Gold'),
 Entry(Name='Rob', Date='May', Color='Black')]    >>> {'fooo':e for e in Entry(*l.split()) for l in open('t.tsv', 'r')}
>>> {e.Name:e for e in list(Entry(*l.split()) for l in open('t.tsv', 'r'))}
<<<
{'Alex': Entry(Name='Alex', Date='Apr', Color='Green'),
 'Ann': Entry(Name='Ann', Date='Jun', Color='Blue'),
 'Kev': Entry(Name='Kev', Date='Mar', Color='Gold'),
 'Name': Entry(Name='Name', Date='Date', Color='Color'),
 'Ray': Entry(Name='Ray', Date='May', Color='Gray'),
 'Rob': Entry(Name='Rob', Date='May', Color='Black')}

I think you are thinking on reading the first row as header names. Python has DictReader - https://docs.python.org/2/library/csv.html#csv.DictReader

>>> import csv
>>> for line in csv.DictReader(open('t.tsv')): print line # don't forget to make your file coma-separated. 
{'Date': 'May', 'Color': 'Gray', 'Name': 'Ray'}
{'Date': 'Apr', 'Color': 'Green', 'Name': 'Alex'}
{'Date': 'Jun', 'Color': 'Blue', 'Name': 'Ann'}
{'Date': 'Mar', 'Color': 'Gold', 'Name': 'Kev'}
{'Date': 'May', 'Color': 'Black', 'Name': 'Rob'}

or with dictionary comprehension:

>>> { line['Name']: line for line in csv.DictReader(open('t.tsv')) }
<<<
{'Alex': {'Color': 'Green', 'Date': 'Apr', 'Name': 'Alex'},
 'Ann': {'Color': 'Blue', 'Date': 'Jun', 'Name': 'Ann'},
 'Kev': {'Color': 'Gold', 'Date': 'Mar', 'Name': 'Kev'},
 'Ray': {'Color': 'Gray', 'Date': 'May', 'Name': 'Ray'},
 'Rob': {'Color': 'Black', 'Date': 'May', 'Name': 'Rob'}}
>>> rows_by_name = { line['Name']: line for line in csv.DictReader(open('t.tsv')) }
>>> rows_by_name['Ann']
<<< {'Color': 'Blue', 'Date': 'Jun', 'Name': 'Ann'}

If you want random samples - i suggest first reading a rows into list and then make selection through randbom module. Or... let's do it with Entry:

>>> rows = list(Entry(*l.split()) for l in open('t.tsv', 'r'))
>>> import random
>>> random.sample(rows, 1)
<<< [Entry(Name='Ray', Date='May', Color='Gray')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Name', Date='Date', Color='Color')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 1)
<<< [Entry(Name='Alex', Date='Apr', Color='Green')]
>>> random.sample(rows, 3)
<<<
[Entry(Name='Ray', Date='May', Color='Gray'),
 Entry(Name='Kev', Date='Mar', Color='Gold'),
 Entry(Name='Ann', Date='Jun', Color='Blue')]
>>> random.sample(rows, 3)
<<<
[Entry(Name='Ann', Date='Jun', Color='Blue'),
 Entry(Name='Rob', Date='May', Color='Black'),
 Entry(Name='Name', Date='Date', Color='Color')]
>>> random.sample(rows, 3)
<<<
[Entry(Name='Rob', Date='May', Color='Black'),
 Entry(Name='Ann', Date='Jun', Color='Blue'),
 Entry(Name='Kev', Date='Mar', Color='Gold')]

but beware, that you can load up your memory too much.

Upvotes: 1

senshin

Reputation: 10360

You're on the right track. First issue: you're never opening the file located at file_location. Thus, when you iterate for row in spreadsheet:, you're iterating over the characters of spreadsheet, which are the characters of file_location, which are the characters of "C:/Users/...". So the first thing you want to do is actually open the file:

spreadsheet = open(file_location, 'r')

You still have another issue in your loop. When you iterate over a file in a for loop, you get back the lines of the file. So, at each iteration, row will be a line, e.g. "Ray May Gray". When you call tuple() on that, you're going to get a tuple that looks like ('R', 'a', 'y', ' ', ' ', 'M', ...). What you need to do is construct your tuple by splitting on whitespace:

entry = Entry(*row.split())

Then, you need to add your entry to the dictionary ss_dict:

ss_dict[entry.Name] = entry

Finally, you can read out the value of ss_dict['Ann'], but this should be outside your loop - if you do it inside your loop, you may be trying to read the value of ss_dict['Ann'] before it has been set. All in all, your code should look like this:

from collections import namedtuple
Entry = namedtuple('Entry', 'Name, Date, Color')
file_location = "C:/Users/abriman/Desktop/Book.csv"
ss_dict = {}
spreadsheet = open(file_location, 'r') # <--
for row in spreadsheet:
    entry = Entry(*row.split()) # <--
    ss_dict[entry.Name] = entry # <--
print ss_dict['Ann']

Incidentally, the reason you're getting your error message there is that when you do for row in spreadsheet: with spreadsheet being a string, row is just a character, as I mentioned, and so tuple(row) is just a tuple containing one character, and hence is of length 1, so that you're only passing one argument rather than three when you do *tuple(row).

All that said, you might want to consider looking at the csv module, which is part of the standard library, and is precisely designed for reading csv files. It will probably make your life easier in the long run.

Upvotes: 4

Slick

Reputation: 359

I think what you need is enumerate

def read_csv_line(line_number, filename):
    with open("filename.csv") as fileobj
        for i, line in enumerate(fileobj):
            if i == (line_number - 1):
                return line
    return None

Then you can feed your random number and filename to get a random line.

Upvotes: 3

How to read a specific row of a csv file in python?

Answers (3)

Related Questions