butterwagon
butterwagon

Reputation: 499

Partially factoring an expression in Sympy

Suppose I have an expression of the form First Form. I know that I can simplify the expression like so: Second Form. However, sympy.simplify and sympy.factor both return the original expression.

To work around this, I've been operating on the expression at a low level:

factor_map = defaultdict(set)
additive_terms = expr.as_coeff_add()[-1]
for term1, term2 in combinations(additive_terms, 2):
    common_terms = (
        set(term1.as_coeff_mul()[-1])
        & set(term2.as_coeff_mul()[-1])
    )
    if common_terms:
        common_factor = sympy.Mul(*common_terms)
        factor_map[common_factor] |= {term1, term2}

factor_map now looks like so:

{
    a: {a⋅x, -a⋅y}, 
    b: {b⋅x, -b⋅y}, 
    c: {-c⋅x, c⋅y}, 
    x: {a⋅x, b⋅x, -c⋅x}, 
    y: {-a⋅y, -b⋅y, c⋅y}
}

I sort it by the number of operations represented by the terms:

factor_list = sorted(
    factor_map.items(),
    key = lambda i: (i[0].count_ops() + 1) * len(i[1])
)[::-1]

I then just rebuild the expression:

used = set()
new_expr = 0
for item in factor_list:
    factor = item[0]
    appearances = item[-1]
    terms = 0
    for instance in appearances:
        if instance not in used:
            terms += instance.as_coefficient(factor)
            used.add(instance)
    new_expr += factor * terms
for term in set(additive_terms) - used:
    new_expr += term

This gives new_expr = d + x*(a + b - c) + y*(-a - b + c). Not great, but better.

I can also improve on this by dividing each combination of additive terms by each other, checking if the result is a number, and using that information to further reduce the output to new_expr = d + (x - y)*(a + b - c).

I've also tried to apply sympy.factor to every possible combination of additive terms, but obviously that blows up very quickly for any reasonably big expression.


Edit: Here's an implementation that uses sympy.factor on all partitions of the set of additive terms (partitions function borrowed from this answer):

def partition(collection):
    if len(collection) == 1:
        yield [ collection ]
        return

    first = collection[0]
    for smaller in partition(collection[1:]):
        # insert `first` in each of the subpartition's subsets
        for n, subset in enumerate(smaller):
            yield smaller[:n] + [[ first ] + subset]  + smaller[n+1:]
        # put `first` in its own subset 
        yield [ [ first ] ] + smaller


def partial_factor(expr):
    args = list(expr.as_coeff_add()[-1])
    # Groupings is just a cache to avoid duplicating factor operations
    groupings = {}

    unique = set()

    for p in partition(args):
        new_expr = 0
        for group in p:
            add_group = sympy.Add(*group)
            new_expr += groupings.setdefault(add_group, sympy.factor(add_group))
        unique.add(new_expr)
    return sorted(unique, key=sympy.count_ops)[0]

For an expression like a*x + b*y + c*z + d + e*x + f*y + h*z, it takes 7.8 seconds to run on my computer, whereas the other method takes 378 microseconds and gives the same result. Seems like there should be a way to be more rigorous than the first method without taking 20,000 times longer to solve it.


I feel like it shouldn't be this hard to get what I want. Is there an easier way to do this?

Upvotes: 3

Views: 1174

Answers (3)

hisermann
hisermann

Reputation: 26

@butterwagon: I tested your work around and found that it fails for expressions like x + 1, where it returns just x. To solve that I'd recommend to change the line

additive_terms = expr.as_coeff_add()[-1]

to

const, additive_terms = expr.as_coeff_add()

and add const to the final result:

new_expr += const

Upvotes: 0

jlh
jlh

Reputation: 4667

This similar question has an answer involving the func parameter to collect(). It seems to work in this particular case as well, although you have to explicitly mention d:

from sympy import *
a, b, c, d, x, y = symbols('a, b, c, d, x, y')
expr = a * x + b * x - c * x - a * y - b * y + c * y + d
collect(expr, d, func=factor)
=> d + (x - y)*(a + b - c)

This helps in this specific case, however a more general and automatic solution does not exist.

Also I could not find any documentation for this func parameter anywhere, other than it existing.

Github issue tracking this problem

Upvotes: 5

user6655984
user6655984

Reputation:

It's hard to suggest a strategy of "partial factoring" that works most of the time. Here is a thing to try, devised with your example in mind (a polynomial of several variables).

Given an expression: Attempt to factor it. If unsuccessful, look at the coefficient of each symbol that it contains; the method Expr.coeff(Symbol) does that. The symbol with the smallest coefficient (measured by the number of symbols contained) is considered an obstacle to factoring and is removed from the expression. Repeat.

This logic is encoded below, and partial_factor(a*x + b*x - c*x - a*y - b*y + c*y + d) does return d + (x - y)*(a + b - c).

def partial_factor(expr):
    to_factor = expr
    non_factorable = 0
    while to_factor != 0:
        if factor(to_factor) != to_factor:
            return factor(to_factor) + non_factorable
        coeffs = {v: to_factor.coeff(v) for v in to_factor.free_symbols}
        min_size = min([len(coeffs[v].free_symbols) for v in coeffs])
        for v in coeffs:
            if len(coeffs[v].free_symbols) == min_size:
                non_factorable += v*coeffs[v]
                to_factor -= v*coeffs[v]
                break
    return expr

Upvotes: 1

Related Questions