Reputation: 105
Hi boost::xpressive users,
I'm getting a stack overflow error when trying to parse some decision trees with boost::xpressive. It seems to work for trees up to a certain size, but fails on 'big' trees, where 'big' seems to mean around 3000 nodes, and the stack with gdb gets to be 133979 frames deep. I'm thinking I need to optimize the regex somehow, but there's no .* anywhere so I'm not sure where to go from here.
#include <boost/regex.hpp>
#include <boost/xpressive/xpressive.hpp>
#include <boost/xpressive/regex_actions.hpp>
using namespace boost::xpressive;
using namespace regex_constants;
sregex integral_number;
sregex floating_point_number;
sregex bid;
sregex ask;
sregex side;
sregex value_on_market_limit_ratio_gt;
sregex value_on_market_delta_ratio_gt;
sregex stdevs_from_mean_auction_time_gt;
sregex no_orders_on_opposite_side;
sregex is_pushing_price;
sregex is_desired;
sregex predicate, leaf, branch, tree;
integral_number = sregex_compiler().compile("[-+]?[0-9]+");
floating_point_number = sregex_compiler().compile("[-+]?[0-9]*\\.?[0-9]+([eE][-+]?[0-9]+)?");
stdevs_from_mean_auction_time_gt = "StdevsFromMeanAuctionTimeGT(" >> floating_point_number >> ")";
side = sregex_compiler().compile("def::BID|def::ASK");
value_on_market_limit_ratio_gt = "ValueOnMarketLimitRatioGT<" >> side >> ">(" >> floating_point_number >> ")";
value_on_market_delta_ratio_gt = "ValueOnMarketDeltaRatioGT(" >> floating_point_number >> ")";
no_orders_on_opposite_side = sregex_compiler().compile("NoOrdersOnOppositeSide");
is_pushing_price = sregex_compiler().compile("IsPushingPrice");
is_desired = sregex_compiler().compile("IsDesired");
predicate = value_on_market_limit_ratio_gt | value_on_market_delta_ratio_gt | stdevs_from_mean_auction_time_gt | no_orders_on_opposite_side | is_pushing_price | is_desired;
leaf = sregex_compiler().compile("SEARCH_TO_MAX|AMEND_TO_AVAILABLE|AMEND_TO_AVAILABLE_MINUS_RECENT_ORDER_SIZE|AMEND_TO_CURRENT_MINUS_RECENT_ORDER_SIZE|SEARCH_BY_RECENT_ORDER_SIZE|PULL|DO_NOTHING");
branch = "Branch(" >> predicate >> "," >> by_ref(tree) >> "," >> by_ref(tree) >> ")";
tree = leaf | branch;
smatch what;
regex_match(s, what, tree)
Here, s is left undefined since it's a string of 75000 characters that doesn't fit in the question. How can I modify these expressions to make the match execute in less space?
Upvotes: 3
Views: 98
Reputation: 105
I found how to fix this, changing the definition of branch to
branch = "Branch(" >> keep(predicate) >> "," >> keep(by_ref(tree)) >> "," >> keep(by_ref(tree)) >> ")";
In order to limit backtracking and thereby memory usage.
Upvotes: 4