How to compute joint distribution from marginal distributions given independence?

Question

This is a follow up question to one I posted previously because apparently I asked the wrong question.

I have two dataframes with relative frequencies of a certain combination of features. The relative frequencies in each one add up to 1. I'd like to join the two dataframes, which share one feature to obtain a new dataframe whose relative frequencies add up to 1 as well.

Here is a MWE:

I have two tibbles like so:

library(dplyr)
my_tib1 <- tibble(feature1 = c("A", "A", "B", "B", "C", "C"), feature2 = c("AA", "BB", "AA", "BB", "AA", "BB"), number = c(0.1, 0.1, 0.3, 0.4, 0.05, 0.05))
my_tib2 <- tibble(feature3 = c("TT", "TT", "FF", "FF"), feature2 = c("AA", "BB", "AA", "BB"), number = c(0.1, 0.4, 0.3, 0.2))

which looks like this:

# A tibble: 6 × 3
  feature1 feature2 number
           
1 A        AA          0.1
2 A        BB          0.1
3 B        AA          0.3
4 B        BB          0.4
5 C        AA          0.05
6 C        BB          0.05

# A tibble: 4 × 3
  feature3 feature2 number
           
1 TT       AA          0.1
2 TT       BB          0.4
3 FF       AA          0.3
4 FF       BB          0.2

Note that feature2 has the same categories in both tibbles. The number is unique for each combination of feature1 and feature2 in my_tib1 and feature2 and feature3 in my_tib2.

For context: The number column represents marginal probabilities and I'd like to multiply the marginal distributions to get joint distributions (I'm aware of the assumptions).

What I think this requires is to get all possible combinations of feature 1, feature2, and feature3 and multiply their number in a new tibble column. The resulting tibble should have a length of 12: 3 x feature1, 2 x feature2, 2 x feature3.

The final tibble should something like this:

# A tibble: 12 × 6
  feature1 feature2 feature3  number.x  number.y  number.mult
                         
1 A        AA       TT        0.1       0.1       0.01
2 A        AA       FF        0.1       0.4       0.04
...

with the numbers in number.mult adding up to 1.

I have tried the following and I think I'm close but it doesn't quite work:

my_tib1 %>% full_join(my_tib2, by = "feature2") %>% mutate(number.mult = number.x*number.y)

This just gives me the 12x6 tibble I'm looking for but the numbers in number.mult do not add up to 1.

How to compute joint distribution from marginal distributions given independence?

Answers (1)

Related Questions