Stratify batch in Tensorflow 2

Question

I have minibatches that I get from an sqlite database with data of integer and float type, x, and a binary label in 0 and 1, y. I am looking for something like X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(y, x, test_size=0.1, random_state=1, stratify=True) from scikit-learn, where a keyword could stratify the data (i.e. the same number of class-0 and class-1 instances).

In Tensorflow 2, stratification seems not straightforwardly possible. My very complicated solution works for me, but takes a lot of time because of all the reshaping and transposing:

def stratify(x, y):
    # number of positive instances (the smaller class)
    pos = np.sum(y).item() # how many positive bonds there are
    x = np.transpose(x)

    # number of features 
    f = np.shape(x)[1] 

    # filter only class 1
    y = tf.transpose(y)
    x_pos = tf.boolean_mask(x, 
    y_pos = tf.boolean_mask(y, y)

    # filter only class 1
    x_neg = tf.boolean_mask(x, tf.bitwise.invert(y)-254)
    x_neg = tf.reshape(x_neg, [f,-1])
    y_neg = tf.boolean_mask(y, tf.bitwise.invert(y)-254)

    # just take randomy as many class-0 as there are class-1 
    x_neg = tf.transpose(tf.random.shuffle(tf.transpose(x_neg)))
    x_neg = x_neg[:,0:pos]
    y_neg = y_neg[0:pos]

    # concat the class-1 and class-0 together, then shuffle, and concat back together
    x = tf.concat([x_pos,tf.transpose(x_neg)],0)
    y = tf.concat([y_pos, tf.transpose(y_neg)],0)
    xy = tf.concat([tf.transpose(x), tf.cast(np.reshape(y,[1, -1]), tf.float64)],0)
    xy = tf.transpose((tf.random.shuffle(tf.transpose(xy)))) # because there is no axis arg in shuffle
    x = xy[0:f,:]
    x = tf.transpose(x)
    y = xy[f,:]

    return x, y

I am happy to see some feedback/improvement on my own function or novel, easier ideas.

Stratify batch in Tensorflow 2

Answers (1)

Related Questions