MineSweeper
MineSweeper

Reputation: 584

Using t-test to get pValue for observed state in two vectors

I am planning to use this on network data.

My network has two kinds of edges. I have written a function which returns the indegree for these two edge types separately for you to see what it looks like:

    Node    G_obs   R_obs
1   N1      3       2
2   N2      1       0
3   N3      9       0
4   N4      1       4
5   N5      1       0
...

and I wrote another function which samples the network edges. Here is what it looks like after this:

    Node    G_obs   R_obs
1   N1      4       1
2   N2      1       0
3   N3      3       6
4   N4      3       2
5   N5      1       0
...

Note that the G_obs+R_obs, aka the indegree of the node stays the same.

I'd like to know the pValue for each node to have the originally observed indegree-split between G_obs and R_obs.

EDIT: Sorry - this seemed to be a little too unclear. I don't want the row-wise probability of the observed distribution. I want the probability of the observed G_obs, R_obs split for each node, where sample(G_obs) + sample(R_obs) still have the same sum for node as before. I should consult an English native speaker for better wording next time.. Hope I described the problem more clearly now :(

EDIT 2

observation:

    Node    G_obs   R_obs
1   N1      3       2
2   N2      1       0
3   N3      9       0
4   N4      1       4
5   N5      1       0

as you can see, N1 has 5 in-edges. 3 of them are green (G_obs), 2 of them are red (R_obs)

for the 5 Nodes shown, we have 15 green edges in total and 6 red edges in total. Now we 'sample' all green and all red edges, aka re-distribute them in their assigned column - but at the same time, N1 still has 5 edges. (See example sampling above, where

    Node    G_obs   R_obs
1   N1      4       1
...

I already have a function which provides the 'sampling' correctly (placeholder for this: mySample(graph) ) and need a function which takes mySample, uses it e.g. 1000 times, and calculates how likely the orginal oberservation was for each node.

Any help appreciated Thank you

Upvotes: 0

Views: 118

Answers (1)

davechilders
davechilders

Reputation: 9123

It sounds like you are after a binomial probability (the probability that randomly dividing the edges between the two types will yield the same distribution as originally observed).

You can compute these probabilities using the dbinom() function:

transform(
  df,
  prob_same = dbinom(G_obs, G_obs + R_obs, prob = .5)
)

data

df <- read.table(
  text = "
   Node    G_obs   R_obs
N1      3       2
N2      1       0
N3      9       0
N4      1       4
N5      1       0
  ",
  header = TRUE
)

Upvotes: 2

Related Questions