hdatas
hdatas

Reputation: 1082

System of equations solver pandas

I have this Data Frame as example:

       Col1         Col2     ...    Col5       Price
 0     Wood         Wood            Plastic     50
 1     Iron         Wood            Wood        70
                            ...
3000   Iron         Iron            Wood        110

I would like to know if it's possible to build a linear solver N equation for N unknowns (in this example to find the Price of Wood, Iron, Plastic etc..)

Many thanks !

Upvotes: 3

Views: 5112

Answers (1)

Elisha
Elisha

Reputation: 23790

The frame can be converted into a linear program, where each row in the frame is a constraint and each material is a variable. Then we can use numpy solver to solve the program (Rajan Chahan mentioned in the question comments).

import numpy as np
import pandas as pd

from numpy.linalg import solve

# Create a simple frame, with two materials - Wood & Iron.
df = pd.DataFrame({'Col1': ['Iron', 'Wood'], 'Col2': ['Wood', 'Wood'], 'Price': [3,2]})

# Extract the materials and map each material to a unique integer
# For example, "Iron"=0 and "Wood"=1
materials = pd.Series(np.unique(df.as_matrix()[:, :-1])).astype('category')

# Create a the coefficients matrix where each row is a constraint
# For example "Iron + Wood" translates into "1*x0 + 1*x1"
# And "Wood + Wood" translates into "0*x0 + 2*x1"
A = np.zeros((len(df), len(materials)))

# Iterate over all constrains and materials and fill the coefficients
for i in range(len(df)):
    for j in range(1, df.shape[1]):
        A[i, materials.cat.categories.get_loc(df.get_value(i, 'Col{}'.format(j)))] += 1

# Solve the program and the solution is an array.
# Each entry in the array correspond to a material price.
solution = solve(A, df['Price'])  # [ 2. 1.]

# Convert to a mapping per-material
material_prices = pd.Series(solution, index=materials.cat.categories)
# Iron    2.0
# Wood    1.0
# dtype: float64

In case the number of materials is different from the number of constrains, you can compute least-squares solution. Replace the line solution = solve(A, df['Price']) from the code above with:

from numpy.linalg import solve, lstsq
solution = lstsq(A, df['Price'])[0]

Upvotes: 5

Related Questions