Reputation: 13690
I have two families of features, the A
s and the B
s, both are sparse (Note that sparsity is a must).
X = (a_1,a_2,...,a_n,b_1,b_2,...,b_m)
I want to fit a Logistic regression, on the interactions between A
and B
.
y~c_11*a_1*b_1 + c_12*a_1*b_2+....+c_nm*a_n*b_m
Using PolynomialFeatures
would also add terms such as a1_*a_2
and b_1*b_2
which are irrelevant.
Is there any other sparse transformer I can use, or do I have to implement it myself ?
Upvotes: 2
Views: 124
Reputation: 13690
I wrote a custom transformer that solved my problem, and is available here
from collections import defaultdict
from itertools import product
from scipy import sparse
from sklearn.base import TransformerMixin
class InteractionBySplit(TransformerMixin):
"""
Takes a sparse matrix as input, and an index to split by, and returns all possible interactions before and after that index.
"""
def __init__(self, split_index,*args,**kwargs):
super().__init__(*args,**kwargs)
self.split_index=split_index
def transform(self,X):
X=X.tocoo()
M=sparse.dok_matrix((X.shape[0],self.split_index*(X.shape[1]-self.split_index)))
pre,post=defaultdict(list),defaultdict(list)
rows=set()
for row,col,v in zip(X.row,X.col,X.data):
rows.add(row)
if col<self.split_index:
pre[row].append((col,v))
else:
post[row].append((col-self.split_index,v))
for row in rows:
for a,b in product(pre[row],post[row]):
M[row,a[0]+b[0]*self.split_index]=a[1]*b[1]
return M.tocsr()
if __name__=="__main__":
X = sparse.coo_matrix([[1,0,0,1,0,0],[1,0,0,0,1,0],[0,1,0,1,0,0]])
Y = InteractionBySplit(3).transform(X).todense()
print(Y)
Upvotes: 1