pyparsing: extracting strings containing specific text

Question

I am trying to learn pyparsing. It sounds promising and something that would be fun to use for text processing. Anyhow, here is my question:

I have a list of course names. For example,

courselist = ["Project Based CALC",
           "CALCULUS I",
           "Calculus II",
           "Intermediate MICRO",
           "Intermediate CALCULUS advance",
           "UNIVERSITY PHYSICS"]

I want to extract courses from a list such as above that have to do with calculus. These are either courses that have the full word CALCULUS or abbreviation CALC. First, suppose that these words appear only in uppercase (there is one with lowercase in the above example; let us ignore that for the moment).

I have written the following code:

import pyparsing as pp

calc = pp.Literal("CALC")
for entry in courselist:
    if len(calc.searchString(entry)) >= 1:
        print entry
    else:
        pass

My first question is, whether there a better way of doing this using pyparsing?

Now the above misses Calculus II. I know I can catch that by defining calc as:

calc = pp.Literal("CALC") | pp.Literal("Calc")

But this will miss cAlc. Is there way to do specify the grammar such that all lower and upper case letters in CALC are matched.

Thank you for your help.

jfs · Accepted Answer

calc = pp.CaselessLiteral('calc')
for entry in courselist:
    if calc.searchString(entry, 1):
        print entry

The effect is similar to:

for entry in courselist:
    if 'calc' in entry.lower():
        print entry

pyparsing: extracting strings containing specific text

Answers (1)

Related Questions