Why CausalNex output in python is wrong?

Question

I am using CausalNex to create a DAG from a dataset in Python.

I got the graph, and the nodes are correct, but the edges are totally off. I tried this in a DataFrame with four random independent variables (Requestor, Risk, Size, Developer) and a single dependent one (Duration), and the graph produced is this:

DAG using CausalNex

Am I using the library incorrectly? Why is the figure so distant from the true data-generating process? Could a Bayesian Network model outperform CausalNex?

I tried this code:

# Generate initial data

import numpy as np
import pandas as pd

np.random.seed(42)
fib_list = [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89]

df = pd.DataFrame({
    "Requestor": np.random.randint(1, 4, 100),
    "Size": np.random.randint(1, 4, 100),
    "Risk": np.random.randint(1, 4, 100)
})

df['Developer'] = np.random.choice(fib_list, df.shape[0])
df["Duration"] = (
    0.1 * df["Requestor"] +
    0.2 * df["Size"] +
    0.2 * df["Risk"] +
    0.5 * df["Developer"]
)

# Generate graph

from causalnex.structure.notears import from_pandas
import matplotlib.pyplot as plt
import networkx as nx

sm = from_pandas(df)
sm.remove_edges_below_threshold(0.8)
nx.draw_shell(sm, with_labels=True, font_weight ="bold")
plt.show()

I was expecting something like this:

Expected Output

Why CausalNex output in python is wrong?

Answers (1)

Related Questions