Reputation: 305
I've a dataframe that list datalogger name and there password. The password is generated inside my script if the datalogger have a blank in the password field. And if there is not a generic password, then I put the specific password in this field for this datalogger. Third case, for some of datalogger, I set a specific string ('NON') in that field to say to my script not to consider this line : the datalogger must be ignore. That give something like this :
datalog, pwd
A001,
A002, 123
A003,
A004,
A005, NON
A006, 456
A007,
A008, NON
A009, 789
A010,
So :
Dataloggers 1, 3, 4, 7, 10 have a generic password.
Dataloggers 2, 6, 9 have a specific password.
Dataloggers 5, 8 must be ignored.
How can I make a pd.read_csv that ignore lines contains 'NON' in the second column ?
Upvotes: 1
Views: 84
Reputation: 24827
Pandas read_csv
API does not allow you to skip rows based on values.
Upon investigation I found this Feature Request: "Skiprows" by a condition or set of conditions which suggest an interesting approach(by David Krych).
gen = pd.read_csv('your.csv', chunksize=10000000)
df = pd.concat((x.query('pwd != "NON"') for x in gen), ignore_index=True)
This will read the data in batches and apply a filter to it before the concat
operation. Thereby, you can still relay on the C parser and do not have to go through the rows twice.
Upvotes: 1