Reputation: 5314

Parsing a string as a Python argument list

Summary

I would like to parse a string that represents a Python argument list into a form that I can forward to a function call.

Detailed version

I am building an application in which I would like to be able to parse out argument lists from a text string that would then be converted into the *args,**kwargs pattern to forward to an actual method. For example, if my text string is:

"hello",42,helper="Larry, the \"wise\""

the parsed result would be something comparable to:

args=['hello',42]
kwargs={'helper': 'Larry, the "wise"'}

I am aware of Python's ast module, but it only seems to provide a mechanism for parsing entire statements. I can sort of fake this by manufacturing a statement around it, e.g.

ast.parse('f("hello",42,helper="Larry, the \"wise\"")'

and then pull the relevant fields out of the Call node, but this seems like an awful lot of roundabout work.

Is there any way to parse just one known node type from a Python AST, or is there an easier approach for getting this functionality?

If it helps, I only need to be able to support numeric and string arguments, although strings need to support embedded commas and escaped-out quotes and the like.

If there is an existing module for building lexers and parsers in Python I am fine with defining my own AST, as well, but obviously I would prefer to just use functionality that already exists and has been tested correct and so on.

Note: Many of the answers focus on how to store the parsed results, but that's not what I care about; it's the parsing itself that I'm trying to solve, ideally without writing an entire parser engine myself.

Also, my application is already using Jinja which has a parser for Python-ish expressions in its own template parser, although it isn't clear to me how to use it to parse just one subexpression like this. (This is unfortunately not something going into a template, but into a custom Markdown filter, where I'd like the syntax to match its matching Jinja template function as closely as possible.)

Upvotes: 5

Answers (5)

Dmitriy Serebryanskiy

Reputation: 21

I've adjusted the solution proposed by Aran-Fey. This works as expected:

import ast

def parse_function_arguments(arg_str):
    # Wrap the argument string in a dummy function call to make it valid Python syntax
    wrapped_arg_str = f"dummy_func({arg_str})"
    # Safe parsing
    tree = ast.parse(wrapped_arg_str, mode="eval")  # Use 'eval' mode for expression parsing
    # Assuming the first part of the tree is a Call node
    funccall = tree.body
    # Process arguments: extract literals directly with safe literal_eval
    args = tuple([ast.literal_eval(arg) for arg in funccall.args])
    # Process keyword arguments: convert values from AST nodes to literals
    kwargs = {kw.arg: ast.literal_eval(kw.value) for kw in funccall.keywords}

    return args, kwargs

Upvotes: 0

John Jiang

Reputation: 955

You can use a function with eval to help you pick apart args and kwargs:

def f(*args, **kwargs):
  return args, kwargs

import numpy as np
eval("f(1, 'a', x=np.int32)")

gives you

((1, 'a'), {'x': <class 'numpy.int32'>})

Upvotes: 0

Aran-Fey

Reputation: 43166

I think ast.parse is your best option.

If the parameters were separated by whitespace, we could use shlex.split:

>>> shlex.split(r'"hello" 42 helper="Larry, the \"wise\""')
['hello', '42', 'helper=Larry, the "wise"']

But unfortunately, that doesn't split on commas:

>>> shlex.split(r'"hello",42,helper="Larry, the \"wise\""')
['hello,42,helper=Larry, the "wise"']

I also thought about using ast.literal_eval, but that doesn't support keyword arguments:

>>> ast.literal_eval(r'"hello",42')
('hello', 42)
>>> ast.literal_eval(r'"hello",42,helper="Larry, the \"wise\""')
Traceback (most recent call last):
  File "<unknown>", line 1
    "hello",42,helper="Larry, the \"wise\""
                     ^
SyntaxError: invalid syntax

I couldn't think of any python literal that supports both positional and keyword arguments.

In lack of better ideas, here's a solution using ast.parse:

import ast

def parse_args(args):
    args = 'f({})'.format(args)
    tree = ast.parse(args)
    funccall = tree.body[0].value

    args = [ast.literal_eval(arg) for arg in funccall.args]
    kwargs = {arg.arg: ast.literal_eval(arg.value) for arg in funccall.keywords}
    return args, kwargs

Output:

>>> parse_args(r'"hello",42,helper="Larry, the \"wise\""')
(['hello', 42], {'helper': 'Larry, the "wise"'})

Upvotes: 10

Ajax1234

Reputation: 71451

You can use re and a simple class to keep track of the tokens:

import re
class Akwargs:
   grammar = r'"[\w\s_]+"|"[\w\s,_"]+"|\d+|[a-zA-Z0-9_]+|\='
   def __init__(self, tokens):
      self.tokens = tokens
      self.args = []
      self.kwargs = {}
      self.parse()
   def parse(self):
      current = next(self.tokens, None)
      if current:
         check_next = next(self.tokens, None)
         if not check_next:
            self.args.append(re.sub('^"+|"+$', '', current))
         else:
            if check_next == '=':
               last = next(self.tokens, None)
               if not last:
                   raise ValueError("Expecting kwargs key")
               self.kwargs[current] = re.sub('^"|"$', '', last)
            else:
               self.args.extend(list(map(lambda x:re.sub('^"+|"+$', '', x), [current, check_next])))
         self.parse()

s = '"hello",42,helper="Larry, the \"wise\""'
tokens = iter(re.findall(Akwargs.grammar, s))
params = Akwargs(tokens)
print(params.args)
print(params.kwargs)

Output:

['hello', '42']
{'helper': 'Larry, the "wise"'}

Full tests:

strings = ['23,"Bill","James"', 'name="someone",age=23,"testing",300','"hello","42"',  "hello=42", 'foo_bar=5']
new_data = [(lambda x:[getattr(x, i) for i in ['args', 'kwargs']])(Akwargs(iter(re.findall(Akwargs.grammar, d)))) for d in strings]

Output:

[[['23', 'Bill', 'James'], {}], [['testing', '300'], {'age': '23', 'name': 'someone'}], [['hello', '42'], {}], [[], {'hello': '42'}], [[], {'foo_bar': '5'}]]

Upvotes: 0

BoarGules

Reputation: 16952

This is not entirely what you wanted, but it comes close.

>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--helper')
>>> kwargs,args = parser.parse_known_args(["hello",'42','--helper="Larry, the \"wise\""'])
>>> vars(kwargs)
{'helper': '"Larry, the "wise""'}
>>> args
['hello', '42']

Upvotes: 0

Parsing a string as a Python argument list

Summary

Detailed version

Answers (5)

Related Questions