BenjiBoy
BenjiBoy

Reputation: 305

Panda read_csv, ignore line that contain specific string

I've a dataframe that list datalogger name and there password. The password is generated inside my script if the datalogger have a blank in the password field. And if there is not a generic password, then I put the specific password in this field for this datalogger. Third case, for some of datalogger, I set a specific string ('NON') in that field to say to my script not to consider this line : the datalogger must be ignore. That give something like this :

datalog, pwd
A001, 
A002, 123
A003, 
A004,
A005, NON
A006, 456
A007,
A008, NON
A009, 789
A010,

So :

Dataloggers 1, 3, 4, 7, 10 have a generic password.

Dataloggers 2, 6, 9 have a specific password.

Dataloggers 5, 8 must be ignored.

How can I make a pd.read_csv that ignore lines contains 'NON' in the second column ?

Upvotes: 1

Views: 84

Answers (1)

user459872
user459872

Reputation: 24827

Pandas read_csv API does not allow you to skip rows based on values.

Upon investigation I found this Feature Request: "Skiprows" by a condition or set of conditions which suggest an interesting approach(by David Krych).

gen = pd.read_csv('your.csv', chunksize=10000000)
df = pd.concat((x.query('pwd != "NON"') for x in gen), ignore_index=True)

This will read the data in batches and apply a filter to it before the concat operation. Thereby, you can still relay on the C parser and do not have to go through the rows twice.

Upvotes: 1

Related Questions