Shogun / quadratic MMD error caused by varying train_test_ratio

Question

I'm using Shogun to run MMD (quadratic) and compare two nonparametric distributions based on their samples (code below is for 1D, but I've also looked at 2D samples). In the toy problem shown below, I try to change the ratio between training and testing samples in the process of selecting an optimized kernel (KSM_MAXIMIZE_MMD is the selection strategy; I've also used KSM_MEDIAN_HEURISTIC). It appears that any ratio other than 1 yields an error.

Am I allowed to change this ratio in this setting? (I see that it is used at: http://www.shogun-toolbox.org/examples/latest/examples/statistical_testing/quadratic_time_mmd.html, but it is set to 1 there)

Concise version of the my code (inspired by the notebook available at: http://www.shogun-toolbox.org/notebook/latest/mmd_two_sample_testing.html):

import shogun as sg
import numpy as np
from scipy.stats import laplace, norm

n = 220
mu = 0.0
sigma2 = 1
b=np.sqrt(0.5)
X = sg.RealFeatures((norm.rvs(size=n) * np.sqrt(sigma2) + mu).reshape(1,-1))
Y = sg.RealFeatures(laplace.rvs(size=n, loc=mu, scale=b).reshape(1,-1))

mmd = sg.QuadraticTimeMMD(X, Y)
mmd.add_kernel(sg.GaussianKernel(10, 1.0))
mmd.set_kernel_selection_strategy(sg.KSM_MAXIMIZE_MMD)
mmd.set_train_test_mode(True)       
mmd.set_train_test_ratio(1)
mmd.select_kernel()

mmd_kernel = sg.GaussianKernel.obtain_from_generic(mmd.get_kernel())
kernel_width = mmd_kernel.get_width()
statistic = mmd.compute_statistic()
p_value = mmd.compute_p_value(statistic)

print p_value

This exact version runs and prints p-values just fine. If I change the argument passed to mmd.set_train_test_ratio() from 1 to 2, I get:

SystemErrorTraceback (most recent call last)
 in ()
     25 kernel_width = mmd_kernel.get_width()
     26 
---> 27 statistic = mmd.compute_statistic()
     28 p_value = mmd.compute_p_value(statistic)
     29 

SystemError: [ERROR] In file /feedstock_root/build_artefacts/shogun-cpp_1512688880429/work/shogun-shogun_6.1.3/src/shogun/statistical_testing/internals/mmd/ComputeMMD.h line 90: assertion kernel_matrix.num_rows==size && kernel_matrix.num_cols==size failed in float32_t shogun::internal::mmd::ComputeMMD::operator()(const shogun::SGMatrix&) const [with T = float; float32_t = float] file /feedstock_root/build_artefacts/shogun-cpp_1512688880429/work/shogun-shogun_6.1.3/src/shogun/statistical_testing/internals/mmd/ComputeMMD.h line 90

It gets worse, if I use the value below 1. In addition to the following error, jupyter notebook kernel crashes every time (after which I need to rerun the entire notebook; the message says: "The kernel appears to have died. It will restart automatically.").

SystemErrorTraceback (most recent call last)
 in ()
     20 mmd.set_train_test_ratio(0.5)
     21 
---> 22 mmd.select_kernel()
     23 
     24 mmd_kernel = sg.GaussianKernel.obtain_from_generic(mmd.get_kernel())

SystemError: [ERROR] In file /feedstock_root/build_artefacts/shogun-cpp_1512688880429/work/shogun-shogun_6.1.3/src/shogun/kernel/Kernel.h line 210: GaussianKernel::kernel(): index out of Range: idx_a=146/146 idx_b=0/146

Complete code (in a jypyter notebook) can be found at: http://nbviewer.jupyter.org/url/dmitry.duplyakin.org/p/jn/kernel-minimal.ipynb

Please let me know if I am missing a step or need to try a different approach.

Side questions:

Both http://www.shogun-toolbox.org/examples/latest/examples/statistical_testing/quadratic_time_mmd.html and http://www.shogun-toolbox.org/notebook/latest/mmd_two_sample_testing.html show examples of using sg.GaussianKernel(10, ). I couldn't find more information about the 1st parameter other than its name, cache size. How and when am I supposed to change it?
As mentioned in the referenced notebook, mmd.get_kernel_selection_strategy().get_name() returns only the generic name, specifically KernelSelectionStrategy. How can I obtain a more specific name for the selected strategy (e.g., KSM_MEDIAN_HEURISTIC) from an instance of the sg.QuadraticTimeMMD class?

Any relevant information or references will be greatly appreciated.

Shogun version: v6.1.3_2017-12-7_19:14

Dmitry Duplyakin · Accepted Answer

Summary (from comments):

The bug does not show up in the latest code
Solution is in: https://github.com/shogun-toolbox/shogun/pull/4134

Shogun / quadratic MMD error caused by varying train_test_ratio

Answers (2)

Related Questions