MMM
MMM

Reputation: 435

skip specific line that contains certain value when you read pandas data frame

When you read the csv using command pd.read_csv, How do I skip the line that contains specific value in a row? If in 50th, 55th row, the 1st column has the value, 100, so I want to skip those lines when I read the csv file. How I can put those command in a read-in command like pd.read_csv('read.csv')? total length of the value is 300.

Upvotes: 2

Views: 5283

Answers (2)

Batman
Batman

Reputation: 8927

The only way is to pre-parse the file. Use a generator to read the file, and then only yield the lines that you want. You can then use that to read the lines you want into a StringIO object, and pass that object in inplace of the filepath to read_csv.

import StringIO
import pandas as pd

def read_file(file_name):
    with open(file_name, 'r') as fh:
        for line in fh.readlines():
            parts = line.split(',')
            if parts[0] != '100':
                yield line

stream = StringIO.StringIO()
stream.writelines(read_file('foo.txt'))
stream.seek(0)

df = pd.read_csv(stream)

Upvotes: 3

Kartik
Kartik

Reputation: 8683

What is the difference between dropping them later, and not reading them at all? You might simply do:

pd.read_csv('file.csv').query('col1 != 100')

Upvotes: 6

Related Questions