Uri Goren
Uri Goren

Reputation: 13690

Interaction term of 2 given fields

I have two families of features, the As and the Bs, both are sparse (Note that sparsity is a must).

X = (a_1,a_2,...,a_n,b_1,b_2,...,b_m)

I want to fit a Logistic regression, on the interactions between A and B.

y~c_11*a_1*b_1 + c_12*a_1*b_2+....+c_nm*a_n*b_m

Using PolynomialFeatures would also add terms such as a1_*a_2 and b_1*b_2 which are irrelevant.

Is there any other sparse transformer I can use, or do I have to implement it myself ?

Upvotes: 2

Views: 124

Answers (1)

Uri Goren
Uri Goren

Reputation: 13690

I wrote a custom transformer that solved my problem, and is available here

from collections import defaultdict
from itertools import product
from scipy import sparse
from sklearn.base import TransformerMixin

class InteractionBySplit(TransformerMixin):
  """
  Takes a sparse matrix as input, and an index to split by, and returns all possible interactions before and after that index.
  """
  def __init__(self, split_index,*args,**kwargs):
    super().__init__(*args,**kwargs)
    self.split_index=split_index
  def transform(self,X):
    X=X.tocoo()
    M=sparse.dok_matrix((X.shape[0],self.split_index*(X.shape[1]-self.split_index)))
    pre,post=defaultdict(list),defaultdict(list)
    rows=set()
    for row,col,v in zip(X.row,X.col,X.data):
      rows.add(row)
      if col<self.split_index:
        pre[row].append((col,v))
      else:
        post[row].append((col-self.split_index,v))
    for row in rows:
      for a,b in product(pre[row],post[row]):
        M[row,a[0]+b[0]*self.split_index]=a[1]*b[1]
    return M.tocsr()

if __name__=="__main__":
  X = sparse.coo_matrix([[1,0,0,1,0,0],[1,0,0,0,1,0],[0,1,0,1,0,0]])
  Y = InteractionBySplit(3).transform(X).todense()
  print(Y)

Upvotes: 1

Related Questions