plot multiple scatterplots from hierarchical model with 2 predictors

Question

I am brand new to python and pymc3. So forgive me if this question is rather silly. I have a dataset called toy_data.csv:

batch_no	batch_id	points	pred1	pred2
12-150	1	1	70.26	10.766
12-150	1	2	65.72	10.512
12-150	1	6	55.44	7.78
12-150	1	4	63.554	12.552
12-150	1	5	61.66	10.95
12-150	1	3	65.44	10.808
12-190	2	3	55.68	6.36
12-190	2	4	67.36	7.38
12-190	2	1	78	9.32
12-190	2	2	70.12	8.554
12-190	3	5	60.68	9.568
12-190	3	3	68.694	9.634
12-190	3	2	56.118	11.474
12-190	3	4	58.54	9.31
12-190	3	6	65.08	10.604
13-999	4	6	56.73	9.754
13-999	4	3	55.492	9.4
13-999	4	2	51.556	9.802
13-999	4	1	53.748	8.85
13-999	4	5	59.054	9.204
13-999	4	4	49.77	9.468
13-999	4	7	58.266	9.954
13-999	4	8	57.78	9.38
14-140	5	2	69.68	12.086
14-140	5	1	68.5	10.438
14-140	5	6	71.7	11.624
14-140	5	3	63.68	11.058
14-140	5	4	62.02	10.498
14-140	5	5	61.94	10.95
14-140	5	9	57.22	10.164
14-140	5	7	54.44	9.798
14-140	5	8	60.82	10.784
14-290	6	1	56.2	9.566
14-290	6	1	50.06	9.23
14-290	6	2	50.76	10.646
14-290	6	2	46.98	8.38
14-700	7	2	92.8	11.532
14-700	7	1	81	9.522
14-700	7	3	75.62	10.004
14-700	8	2	71.44	10.076
14-889	8	1	55.2	8.1
14-889	8	3	71.18	9.412
14-889	8	4	53.24	7.138
14-900	9	3	50.08	7.744
14-900	9	1	47.138	8.294
14-900	9	4	68.56	11.362
14-900	9	1	69.34	11.584
14-900	9	2	63.14	10.77

I would like to grab the relevant batch & predictor from the hierarchical model trace to plot them and fit the abline line. The final output is to replicate the graph below but instead of the gray range, I'd like to show the abline for each batch.

Ideal output (sourced from github)

For example:

Abline = y_best_fit_i=(mu_a+a_i)+[(mu_b+b_i)*pred_i] with all parameters being backscaled using the mean and std from the respective x variable (pred1 or pred2). Considering only pred1, the formula for backscaling will be mu_a = (mu_a*pred1.std()) + pred1.mean(); this will be applied to mu_b and all a_i and b_i values. However, backscaling as per pred1 mean and std dev will only apply to the trace from pred1, not pred2 (i.e. it will apply to mu_a, mu_b and a[0,0,0] and b[0,0,0] but not a[0,1,0] or b[0,1,0]). I would like to only apply to mu_a, mu_b , a[i, 0,0] and b[i, 0,0] where i supposed to be batch_no (see table below) and have it automated if possible. The table below is the summary of the tracing.

Summary Table

Would be good if I could find a way to get the numbers from trace itself to plot the final graph. Any help is greatly appreciated.

My code that I used to model is below:

import pandas as pd
import numpy as np
import pymc3 as pm
from sklearn import preprocessing

seed = 68492
np.random.seed(seed)

data = pd.read_csv("toy_data.csv")    

# assigning variables
batch_no = data.iloc[:, 0]
batch_id = (data.iloc[:, 1])-1
point = data.iloc[:, 2]
pred = data.iloc[:, 3:5]
pred1 = data.iloc[:, 3]
pred2 = data.iloc[:, 4]


# creating indicators
n_pred = pred.shape[1]
n_batch = len(np.unique(batch_id))

# standarising variable
std_pred = preprocessing.scale(pred)


# Hierarchical model 
with pm.Model() as hierarchical_model1:

    # Hyperpriors
    mu_a = pm.Normal('mu_a', mu=0, sigma=100)
    sigma_a = pm.HalfNormal('sigma_a', sigma=100)
    mu_b = pm.Normal('mu_b', mu=0, sigma=100)
    sigma_b = pm.HalfNormal('sigma_b', sigma=100)   
    

    # Intercept for each batch, distributed around group mean mu_a
    a = pm.Normal('a', mu=mu_a, sd=sigma_a, shape=(n_batch, n_pred, 1))
    # Slope for each batcht, distributed around group mean mu_b
    b = pm.Normal('b', mu=mu_b, sd=sigma_b, shape=(n_batch, n_pred, 1))
    
    # Model error
    eps = pm.HalfCauchy('eps', beta=5)
    
        
    # Expected value
    omega = (pm.math.dot(std_pred, b))[batch_id,1]
    tpoint_est = a[batch_id,1] + omega

         
    # Data likelihood
    y_like = pm.Normal('y_like', mu=tpoint_est, sigma=eps, observed=point)
    

with hierarchical_model1:
    hier_trace = pm.sample(draws = 4000, tune = 1000, chains = 2)
    hier_out = pm.summary(hier_trace)

plot multiple scatterplots from hierarchical model with 2 predictors

Answers (1)

Related Questions