Reputation: 31
I have a CSV (comma separated values) file that contains student information. The column headers look like StudentId, StudentFirstName, StudentLastName, StudentZipCode, StudentHeight, StudentCommuteMethod, etc. and the subsequent rows contain the information for individual students. Now, I would like to write a python 2.5 script that takes a filtering condition as a command line parameter and return the set of students (rows) that match this filter condition. For example, the filter condition could be something like below (using pseudo code format):
"StudentCommuteMethod = Bus AND StudentZipCode = 12345"
and the python script could be invoked:
MyPythonScript.py -filter "<above string>" -i input.csv
This should return the list of all students (rows) who live in an area with zip code 12345 and who commute by bus. The filter could also be arbitrarily complex and may include any number of AND, OR operators.
QUESTIONS:
What is the best format in which this program could have the user specify the filter condition (as a command line parameter). The format should be simple for simple expressions and must be powerful enough to express all types of conditions.
I would like to have a UI for expressing the filter condition in a visual manner. Perhaps something that allows entering a simple two-operand condition per row and some inutive way to combine them using ANDs and ORs. It should be able to emit a filter expression in the format decided for (1) above. Is there some open source project I could reuse for it?
If you think that there is a better way to solve this problem than passing a command line expression + UI, feel free to mention it. In the end, the user (an electrical engineer who doesn't know a lot about programming) should be able to enter the filter expression easily.
Thanks!
NOTE: I don't have control over the input or output format (both csv files).
Upvotes: 2
Views: 746
Reputation: 8751
This is a slight variation on Danilo's suggestion. You can avoid the exec
to bind variables for each row by passing in a locals dictionary to eval
, and the dicts returned by csv.DictReader work very well for this:
import csv, optparse
infile = open('datafile.csv')
reader = csv.DictReader(infile)
parser = optparse.OptionParser()
parser.add_option('--filter', type='string', dest='filter')
options, args = parser.parse_args()
for row in reader:
if eval(options.filter, row):
print row
This assumes the first line of the input file has column headers, and any headers you want to use in an expression have to be valid Python identifiers.
Upvotes: 1
Reputation: 7844
You are definitely trying to reimplement SQL in Python. I believe it would be better to use a relational database and just run SQL queries.
However, regarding question 1, you can easily let the user enter Python expressions and eval()
them on each row of data.
This is a working example, which uses exec
to bind column values to local variables (a nasty hack, I admit). CVS parsing omitted for brevity.
import optparse, sys
# Assume your CSV data is read into a list of dictionaries
sheet = [
{'StudentId': 1, 'StudentFirstName': 'John', 'StudentLastName': 'Doe', 'StudentZipCode': '12345', 'StudentCommuteMethod': 'Bus'},
{'StudentId': 2, 'StudentFirstName': 'Bob', 'StudentLastName': 'Chen', 'StudentZipCode': '12345', 'StudentCommuteMethod': 'Bus'},
{'StudentId': 3, 'StudentFirstName': 'Jane', 'StudentLastName': 'Smith', 'StudentZipCode': '12345', 'StudentCommuteMethod': 'Train'},
{'StudentId': 4, 'StudentFirstName': 'Dave', 'StudentLastName': 'Burns', 'StudentZipCode': '45467', 'StudentCommuteMethod': 'Bus'},
]
# Options parsing
parser = optparse.OptionParser()
parser.add_option('--filter', type='string', dest='filter')
options, args = parser.parse_args()
# Filter option is required
if options.filter is None:
print >> sys.stderr, 'error: no filter expression given'
sys.exit(1)
# Process rows and build result set
result = []
for row in sheet:
# Bind each column to a local variable (StudentId, StudentFirstName, etc.);
# this allows evaluating Python expressions on a row, for example:
# 'StudentCommuteMethod = "Bus" and StudentZipCode = "12345"'
for col, val in row.iteritems():
exec '%s = %s' % (col, repr(val))
# Apply filter to the row
if eval(options.filter):
result.append(row)
# Print out result set
for row in result:
print row
I tested it using the following filter expressions:
./MyPythonScript.py --filter 'StudentCommuteMethod == "Bus" and StudentZipCode == "12345"'
./MyPythonScript.py --filter 'StudentCommuteMethod == "Bus" or StudentZipCode == "12345"'
(Beware of shell quoting rules when running the program from the command line.)
Upvotes: 1