Reputation: 163
For a project, I need to create synthetic categorical data containing specific dependencies between the attributes. This can be done by sampling from a pre-defined Bayesian Network. After some exploration on the internet, I found that Pomegranate
is a good package for Bayesian Networks, however - as far as I'm concerned - it seems unpossible to sample from such a pre-defined Bayesian Network. As an example, model.sample()
raises a NotImplementedError
(despite this solution says so).
Does anyone know if there exists a library which provides a good interface for the construction and sampling of/from a Bayesian network?
Upvotes: 6
Views: 4505
Reputation: 411
I was also searching for a library in python to work with bayesian networks learning, sampling, inference and I found bnlearn. I tried a couple of examples and it worked. It is possible to import several existing repositories or any .bif type. As per this library,
Sampling of data is based on forward sampling from joint distribution of the Bayesian network. In order to do that, it requires as input a DAG connected with CPDs. It is also possible to create a DAG manually (see create DAG section) or load an existing one
Upvotes: 1
Reputation: 41
Another option is pgmpy which is a Python library for learning (structure and parameter) and inference (statistical and causal) in Bayesian Networks.
You can generate forward and rejection samples as a Pandas dataframe or numpy recarray.
The following code generates 20 forward samples from the Bayesian network "diff -> grade <- intel" as recarray.
from pgmpy.models.BayesianModel import BayesianModel
from pgmpy.factors.discrete import TabularCPD
from pgmpy.sampling import BayesianModelSampling
student = BayesianModel([('diff', 'grade'), ('intel', 'grade')])
cpd_d = TabularCPD('diff', 2, [[0.6], [0.4]])
cpd_i = TabularCPD('intel', 2, [[0.7], [0.3]])
cpd_g = TabularCPD('grade', 3, [[0.3, 0.05, 0.9, 0.5], [0.4, 0.25, 0.08, 0.3], [0.3, 0.7, 0.02, 0.2]], ['intel', 'diff'], [2, 2])
student.add_cpds(cpd_d, cpd_i, cpd_g)
inference = BayesianModelSampling(student)
df_samples = inference.forward_sample(size=20, return_type='recarray')
print(df_samples)
Upvotes: 4
Reputation: 541
Using pyAgrum, you just have to :
#import pyAgrum
import pyAgrum as gum
# create a BN
bn=gum.fastBN("A->B[3]<-C{yes|No}->D")
# specify some CPTs (randomly filled by fastBN)
bn.cpt("A").fillWith([0.3,0.7])
# and then generate a database
gum.generateCSV(bn,"sample.csv",1000,with_labels=True,random_order=False)
# which returns the LL(database)
See http://webia.lip6.fr/~phw/aGrUM/docs/last/notebooks/ for more notebooks using pyAgrum
Disclaimer: I am one of the authors of pyAgrum :-)
Upvotes: 5
Reputation: 1411
Another option is Bayespy (https://www.bayespy.org/index.html).
You build the network using nodes.
And on every node, you can call random()
which essentially samples from its distribution: https://www.bayespy.org/dev_api/generated/generated/bayespy.inference.vmp.nodes.stochastic.Stochastic.random.html#bayespy.inference.vmp.nodes.stochastic.Stochastic.random
Upvotes: 1
Reputation: 163
I found out that PyAgrum (https://agrum.gitlab.io/pages/pyagrum.html) does the job. It can both be used to create a Bayesian Network via the BayesNet()
class and to sample from such a network by using the .drawSamples()
method from the a BNDatabaseGenerator()
class.
Upvotes: 1