Reputation: 5314
I would like to parse a string that represents a Python argument list into a form that I can forward to a function call.
I am building an application in which I would like to be able to parse out argument lists from a text string that would then be converted into the *args,**kwargs
pattern to forward to an actual method. For example, if my text string is:
"hello",42,helper="Larry, the \"wise\""
the parsed result would be something comparable to:
args=['hello',42]
kwargs={'helper': 'Larry, the "wise"'}
I am aware of Python's ast module, but it only seems to provide a mechanism for parsing entire statements. I can sort of fake this by manufacturing a statement around it, e.g.
ast.parse('f("hello",42,helper="Larry, the \"wise\"")'
and then pull the relevant fields out of the Call
node, but this seems like an awful lot of roundabout work.
Is there any way to parse just one known node type from a Python AST, or is there an easier approach for getting this functionality?
If it helps, I only need to be able to support numeric and string arguments, although strings need to support embedded commas and escaped-out quotes and the like.
If there is an existing module for building lexers and parsers in Python I am fine with defining my own AST, as well, but obviously I would prefer to just use functionality that already exists and has been tested correct and so on.
Note: Many of the answers focus on how to store the parsed results, but that's not what I care about; it's the parsing itself that I'm trying to solve, ideally without writing an entire parser engine myself.
Also, my application is already using Jinja which has a parser for Python-ish expressions in its own template parser, although it isn't clear to me how to use it to parse just one subexpression like this. (This is unfortunately not something going into a template, but into a custom Markdown filter, where I'd like the syntax to match its matching Jinja template function as closely as possible.)
Upvotes: 5
Views: 6394
Reputation: 21
I've adjusted the solution proposed by Aran-Fey. This works as expected:
import ast
def parse_function_arguments(arg_str):
# Wrap the argument string in a dummy function call to make it valid Python syntax
wrapped_arg_str = f"dummy_func({arg_str})"
# Safe parsing
tree = ast.parse(wrapped_arg_str, mode="eval") # Use 'eval' mode for expression parsing
# Assuming the first part of the tree is a Call node
funccall = tree.body
# Process arguments: extract literals directly with safe literal_eval
args = tuple([ast.literal_eval(arg) for arg in funccall.args])
# Process keyword arguments: convert values from AST nodes to literals
kwargs = {kw.arg: ast.literal_eval(kw.value) for kw in funccall.keywords}
return args, kwargs
Upvotes: 0
Reputation: 955
You can use a function with eval to help you pick apart args and kwargs:
def f(*args, **kwargs):
return args, kwargs
import numpy as np
eval("f(1, 'a', x=np.int32)")
gives you
((1, 'a'), {'x': <class 'numpy.int32'>})
Upvotes: 0
Reputation: 43166
I think ast.parse
is your best option.
If the parameters were separated by whitespace, we could use shlex.split
:
>>> shlex.split(r'"hello" 42 helper="Larry, the \"wise\""')
['hello', '42', 'helper=Larry, the "wise"']
But unfortunately, that doesn't split on commas:
>>> shlex.split(r'"hello",42,helper="Larry, the \"wise\""')
['hello,42,helper=Larry, the "wise"']
I also thought about using ast.literal_eval
, but that doesn't support keyword arguments:
>>> ast.literal_eval(r'"hello",42')
('hello', 42)
>>> ast.literal_eval(r'"hello",42,helper="Larry, the \"wise\""')
Traceback (most recent call last):
File "<unknown>", line 1
"hello",42,helper="Larry, the \"wise\""
^
SyntaxError: invalid syntax
I couldn't think of any python literal that supports both positional and keyword arguments.
In lack of better ideas, here's a solution using ast.parse
:
import ast
def parse_args(args):
args = 'f({})'.format(args)
tree = ast.parse(args)
funccall = tree.body[0].value
args = [ast.literal_eval(arg) for arg in funccall.args]
kwargs = {arg.arg: ast.literal_eval(arg.value) for arg in funccall.keywords}
return args, kwargs
Output:
>>> parse_args(r'"hello",42,helper="Larry, the \"wise\""')
(['hello', 42], {'helper': 'Larry, the "wise"'})
Upvotes: 10
Reputation: 71451
You can use re
and a simple class to keep track of the tokens:
import re
class Akwargs:
grammar = r'"[\w\s_]+"|"[\w\s,_"]+"|\d+|[a-zA-Z0-9_]+|\='
def __init__(self, tokens):
self.tokens = tokens
self.args = []
self.kwargs = {}
self.parse()
def parse(self):
current = next(self.tokens, None)
if current:
check_next = next(self.tokens, None)
if not check_next:
self.args.append(re.sub('^"+|"+$', '', current))
else:
if check_next == '=':
last = next(self.tokens, None)
if not last:
raise ValueError("Expecting kwargs key")
self.kwargs[current] = re.sub('^"|"$', '', last)
else:
self.args.extend(list(map(lambda x:re.sub('^"+|"+$', '', x), [current, check_next])))
self.parse()
s = '"hello",42,helper="Larry, the \"wise\""'
tokens = iter(re.findall(Akwargs.grammar, s))
params = Akwargs(tokens)
print(params.args)
print(params.kwargs)
Output:
['hello', '42']
{'helper': 'Larry, the "wise"'}
Full tests:
strings = ['23,"Bill","James"', 'name="someone",age=23,"testing",300','"hello","42"', "hello=42", 'foo_bar=5']
new_data = [(lambda x:[getattr(x, i) for i in ['args', 'kwargs']])(Akwargs(iter(re.findall(Akwargs.grammar, d)))) for d in strings]
Output:
[[['23', 'Bill', 'James'], {}], [['testing', '300'], {'age': '23', 'name': 'someone'}], [['hello', '42'], {}], [[], {'hello': '42'}], [[], {'foo_bar': '5'}]]
Upvotes: 0
Reputation: 16952
This is not entirely what you wanted, but it comes close.
>>> import argparse
>>> parser = argparse.ArgumentParser()
>>> parser.add_argument('--helper')
>>> kwargs,args = parser.parse_known_args(["hello",'42','--helper="Larry, the \"wise\""'])
>>> vars(kwargs)
{'helper': '"Larry, the "wise""'}
>>> args
['hello', '42']
Upvotes: 0