Reputation: 435
When you read the csv using command pd.read_csv, How do I skip the line that contains specific value in a row? If in 50th, 55th row, the 1st column has the value, 100, so I want to skip those lines when I read the csv file. How I can put those command in a read-in command like pd.read_csv('read.csv')? total length of the value is 300.
Upvotes: 2
Views: 5283
Reputation: 8927
The only way is to pre-parse the file. Use a generator to read the file, and then only yield
the lines that you want. You can then use that to read the lines you want into a StringIO
object, and pass that object in inplace of the filepath to read_csv
.
import StringIO
import pandas as pd
def read_file(file_name):
with open(file_name, 'r') as fh:
for line in fh.readlines():
parts = line.split(',')
if parts[0] != '100':
yield line
stream = StringIO.StringIO()
stream.writelines(read_file('foo.txt'))
stream.seek(0)
df = pd.read_csv(stream)
Upvotes: 3
Reputation: 8683
What is the difference between dropping them later, and not reading them at all? You might simply do:
pd.read_csv('file.csv').query('col1 != 100')
Upvotes: 6