Reputation: 1082
I have this Data Frame as example:
Col1 Col2 ... Col5 Price
0 Wood Wood Plastic 50
1 Iron Wood Wood 70
...
3000 Iron Iron Wood 110
I would like to know if it's possible to build a linear solver N equation for N unknowns (in this example to find the Price of Wood, Iron, Plastic etc..)
Many thanks !
Upvotes: 3
Views: 5112
Reputation: 23790
The frame can be converted into a linear program, where each row in the frame is a constraint and each material is a variable. Then we can use numpy solver to solve the program (Rajan Chahan mentioned in the question comments).
import numpy as np
import pandas as pd
from numpy.linalg import solve
# Create a simple frame, with two materials - Wood & Iron.
df = pd.DataFrame({'Col1': ['Iron', 'Wood'], 'Col2': ['Wood', 'Wood'], 'Price': [3,2]})
# Extract the materials and map each material to a unique integer
# For example, "Iron"=0 and "Wood"=1
materials = pd.Series(np.unique(df.as_matrix()[:, :-1])).astype('category')
# Create a the coefficients matrix where each row is a constraint
# For example "Iron + Wood" translates into "1*x0 + 1*x1"
# And "Wood + Wood" translates into "0*x0 + 2*x1"
A = np.zeros((len(df), len(materials)))
# Iterate over all constrains and materials and fill the coefficients
for i in range(len(df)):
for j in range(1, df.shape[1]):
A[i, materials.cat.categories.get_loc(df.get_value(i, 'Col{}'.format(j)))] += 1
# Solve the program and the solution is an array.
# Each entry in the array correspond to a material price.
solution = solve(A, df['Price']) # [ 2. 1.]
# Convert to a mapping per-material
material_prices = pd.Series(solution, index=materials.cat.categories)
# Iron 2.0
# Wood 1.0
# dtype: float64
In case the number of materials is different from the number of constrains, you can compute least-squares solution. Replace the line solution = solve(A, df['Price'])
from the code above with:
from numpy.linalg import solve, lstsq
solution = lstsq(A, df['Price'])[0]
Upvotes: 5