Reputation: 201
I am using pyAgrum for Bayesian networks analysis, the error message I am receiving is due to the fact that when I am splitting my data with a test size of 0.2, there are some combinations of my variables (specifically, the combination COL=0, RA=4, INC=0 for the target node PARTNER) that causing the error.
DatabaseError: [pyAgrum] Database error: The conditioning set <COL=0, RA=4, INC=0> for target node PARTNER never appears in the database. Please consider using priors such as smoothing.
The issue is not there with the test size of 0.4 and additionally this combination does not exist in the original data as well.
Is there any method that I can use?
# Ensure that your DataFrame is shuffled
df_net = df_net.sample(frac=1, random_state=42).reset_index(drop=True)
# Use stratified splitting
df_train, df_test = train_test_split(df_net, test_size=0.2, random_state=42, stratify=df_net['PARTNER'])
learner = gum.BNLearner(df_train)
learner.useLocalSearchWithTabuList()
learner.useScoreBIC()
bn = learner.learnBN()
If I run this code, I want to analysis the goodness of the fit with some evaluation metrics.
Upvotes: 0
Views: 46