Reputation: 67
I have a dataframe. Here's a small part of it:
SubjectId Gender Groups ExtraCalories GW
1: 1 F G3 -1310.00000 0.000000
2: 2 M G6 -920.79656 4.331278
3: 3 M G2 -25.39517 4.727376
4: 4 M G5 169.25645 3.543941
5: 5 M G5 -340.67235 4.591774
---
996: 996 F G1 464.82543 5.933792
997: 997 M G8 -323.65136 5.024453
998: 998 F G3 77.92138 5.383686
999: 999 M G9 -237.83700 5.423941
1000: 1000 F G9 -400.44831 6.837965
How do I find probability of a female choosing G5.
Upvotes: 0
Views: 1131
Reputation: 76450
Here is a base R solution with prop.table
on a table
of counts.
prop.table(table(df1[c('Gender', 'Groups')]))
# Groups
#Gender G1 G2 G3 G5 G6 G8 G9
# F 0.1 0.0 0.2 0.0 0.0 0.0 0.1
# M 0.0 0.1 0.0 0.2 0.1 0.1 0.1
Data in dput
format
df1 <-
structure(list(SubjectId = c(1L, 2L, 3L, 4L, 5L, 996L, 997L,
998L, 999L, 1000L), Gender = c("F", "M", "M", "M", "M", "F",
"M", "F", "M", "F"), Groups = c("G3", "G6", "G2", "G5", "G5",
"G1", "G8", "G3", "G9", "G9"), ExtraCalories = c(-1310, -920.79656,
-25.39517, 169.25645, -340.67235, 464.82543, -323.65136, 77.92138,
-237.837, -400.44831), GW = c(0, 4.331278, 4.727376, 3.543941,
4.591774, 5.933792, 5.024453, 5.383686, 5.423941, 6.837965)),
class = "data.frame", row.names = c("1:", "2:", "3:", "4:", "5:",
"996:", "997:", "998:", "999:", "1000:"))
Upvotes: 0
Reputation: 6226
I guess you want to approximate probability by frequency. Last two options are more general than the base R solution
R
nrow(df[df$Gender == "F" & df$Groups == "G5",])/nrow(df[df$Gender == "F",])
library(dplyr)
df %>% filter(Gender == "F") %>%
group_by(Groups) %>%
summarise(n = n()) %>%
ungroup() %>%
mutate(p = n/n())
library(data.table)
setDT(df)
df[Gender == "F"][,.(n = .N),by = Groups][,.(n/.N)]
Upvotes: 1