Reputation:
I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?
This is the code so far:
import csv
with open('all16.csv', 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = float
data = (datatype(column) for row in incsv)
least_value = min(data)
print least_value
Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.
Upvotes: 146
Views: 298016
Reputation: 517
A generator can be used to apply customized skip logic.
For example, consider a CSV file that has several lines of general information:
Client name: George P. Burdell
Client number: 1234567
Client preferred languages: English, Spanish
Date,Description,Amount
2024-09-10,Deposit,$25.00
...
Rather than skipping a specific number of lines, a generator could search for the header line:
def open_file_and_skip_to(path, search_term):
found = False
with open(path) as f:
for line in f:
if search_term in line:
found = True
if found:
yield line
reader = csv.DictReader(open_file_and_skip_to(path, "Date,Description"))
for row in reader:
print(row)
Specifically to skip N lines, use:
def open_file_and_skip_lines(path, num_lines=1):
with open(path) as f:
for line in f:
if num_lines > 0:
num_lines -= 1
continue
yield line
Upvotes: 0
Reputation: 187
Simple Solution is to use csv.DictReader()
import csv
def read_csv(file): with open(file, 'r') as file:
reader = csv.DictReader(file)
for row in reader:
print(row["column_name"]) # Replace the name of column header.
Upvotes: 0
Reputation: 78
this might be a very old question but with pandas we have a very easy solution
import pandas as pd
data=pd.read_csv('all16.csv',skiprows=1)
data['column'].min()
with skiprows=1 we can skip the first row then we can find the least value using data['column'].min()
Upvotes: 4
Reputation: 13727
Python 2.x
Return the next row of the reader’s iterable object as a list, parsed according to the current dialect.
csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
print(row) # should print second row
Python 3.x
Return the next row of the reader’s iterable object as a list (if the object was returned from reader()) or a dict (if it is a DictReader instance), parsed according to the current dialect. Usually you should call this as next(reader).
csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
print(row) # should print second row
Upvotes: 6
Reputation: 302
I would convert csvreader to list, then pop the first element
import csv
with open(fileName, 'r') as csvfile:
csvreader = csv.reader(csvfile)
data = list(csvreader) # Convert to list
data.pop(0) # Removes the first row
for row in data:
print(row)
Upvotes: 1
Reputation: 123463
You could use an instance of the csv
module's Sniffer
class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next()
function to skip over the first row only when necessary:
import csv
with open('all16.csv', 'r', newline='') as file:
has_header = csv.Sniffer().has_header(file.read(1024))
file.seek(0) # Rewind.
reader = csv.reader(file)
if has_header:
next(reader) # Skip header row.
column = 1
datatype = float
data = (datatype(row[column]) for row in reader)
least_value = min(data)
print(least_value)
Since datatype
and column
are hardcoded in your example, it would be slightly faster to process the row
like this:
data = (float(row[1]) for row in reader)
Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:
with open('all16.csv', 'rb') as file:
Upvotes: 120
Reputation: 3950
The documentation for the Python 3 CSV module provides this example:
with open('example.csv', newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
The Sniffer
will try to auto-detect many things about the CSV file. You need to explicitly call its has_header()
method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:
if sniffer.has_header():
for header_row in reader:
break
for data_row in reader:
# do something with the row
Upvotes: 3
Reputation: 33
Because this is related to something I was doing, I'll share here.
What if we're not sure if there's a header and you also don't feel like importing sniffer and other things?
If your task is basic, such as printing or appending to a list or array, you could just use an if statement:
# Let's say there's 4 columns
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
# read first line
first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
if len(first_line) == 4:
array.append(first_line)
# Now we'll just iterate over everything else as usual:
for row in csvreader:
array.append(row)
Upvotes: 2
Reputation: 681
Borrowed from python cookbook,
A more concise template code might look like this:
import csv
with open('stocks.csv') as f:
f_csv = csv.reader(f)
headers = next(f_csv)
for row in f_csv:
# Process row ...
Upvotes: 44
Reputation: 81
For me the easiest way to go is to use range.
import csv
with open('files/filename.csv') as I:
reader = csv.reader(I)
fulllist = list(reader)
# Starting with data skipping header
for item in range(1, len(fulllist)):
# Print each row using "item" as the index value
print (fulllist[item])
Upvotes: 1
Reputation: 16999
Python 3.X
Handles UTF8 BOM + HEADER
It was quite frustrating that the csv
module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file).
This works for me using only the csv
module:
import csv
def read_csv(self, csv_path, delimiter):
with open(csv_path, newline='', encoding='utf-8') as f:
# https://bugs.python.org/issue7185
# Remove UTF8 BOM.
txt = f.read()[1:]
# Remove header line.
header = txt.splitlines()[:1]
lines = txt.splitlines()[1:]
# Convert to list.
csv_rows = list(csv.reader(lines, delimiter=delimiter))
for row in csv_rows:
value = row[INDEX_HERE]
Upvotes: 0
Reputation: 519
just add [1:]
example below:
data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**
that works for me in iPython
Upvotes: 0
Reputation: 305
I would use tail to get rid of the unwanted first line:
tail -n +2 $INFIL | whatever_script.py
Upvotes: 0
Reputation: 1541
In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader
.
with open('all16.csv') as tmp:
# Skip first line (if any)
next(tmp, None)
# {line_num: row}
data = dict(enumerate(csv.DictReader(tmp)))
Upvotes: 25
Reputation: 4592
Well, my mini wrapper library would do the job as well.
>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])
Meanwhile, if you know what header column index one is, for example "Column 1", you can do this instead:
>>> min(data.column["Column 1"])
Upvotes: 1
Reputation: 6726
The new 'pandas' package might be more relevant than 'csv'. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.
import pandas as pd
data = pd.read_csv('all16.csv')
data.min()
Upvotes: 2
Reputation: 414215
To skip the first line just call:
next(inf)
Files in Python are iterators over lines.
Upvotes: 91
Reputation: 23364
use csv.DictReader instead of csv.Reader. If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row["1"] etc
Upvotes: 8
Reputation: 142156
You would normally use next(incsv)
which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:
from itertools import islice
for row in islice(incsv, 30, None):
# process
Upvotes: 19