Reputation: 7916
I've got a relatively small (<100K) numerical CSV dataset that I want to process and graph with some numpy and pylab utilities, and it occurred to me that there's probably a better way of processing the data than ridiculous custom if-ladders for siphoning out the relevent experimental scenarios and comparisons.
If this data were in a DB rather than a CSV this wouldn't be a problem, but throwing together a 'real' db instance for the sake of this seems to be overkill. Is there a pythonic solution to what I'm looking for?
TL;DR Want to query CSV files like a DB / move CSV's into a mini-db.
Upvotes: 7
Views: 8935
Reputation: 7530
Without knowing any specific details (at all) of your case, I'll expect that you'll find eventually one of the following ladders as a dominant one for your case:
ridiculous custom if-ladders
.Obviously, any of the ladders sketched above will posses its specific pros and cons, depending on the actual case. Thus a really careful mix of them may eventually yield to best 'overall' result.
Upvotes: 7
Reputation: 809
Perhaps pandas could help you out. In particular the query function.
Pandas can also do joins but at that point I would switch to SQL. A tiny database wrapper is dataset.
Upvotes: 1
Reputation: 143865
I once started to write a library of utilities called wavemol. One subpackage I developed was wavemol.fileaccess, which contains a CSV parsing class, which allows to access the file in a more practical way. Check here the methods provided.
you may need to install wavemol.core first. I am not actively developing this code anymore, but if you are interested and this stuff does the trick for you, I may find some time to refocus on it a bit and put it back on track (of course help is welcome, but not necessary to make it a little better). I sort of lost interest into it because I changed job and I didn't need this stuff anymore.
Upvotes: 0