Reputation: 87
I have a df and I wanted to run the Shapiro test for every unique ID regarding their values in the column Value and I put the results in a data table (dt_table). I also wrote a few lines of code in order to replace the NAs with the mean for every unique ID again. But what I really want in to check on the dt_table and if the column "accept_H1" is TRUE then replace the NA with the median but if FALSE to replace the NA with the mean
dput(df)
structure(list(ID = c("F1", "F1", "F1", "F1", "F1", "F1", "F1",
"F2", "F2", "F2", "F2", "F2", "F2", "F2", "F2", "F3", "F3", "F3",
"F3", "F3", "F3", "F3", "F3", "F3", "F4", "F4", "F4", "F4", "F4",
"F4", "F4", "F4"), Values = c(9.6, NA, 10.2, 9.8, 9.9, 9.9, 9.9,
1.2, 1.2, 1.8, 1.5, 1.5, 1.6, 1.4, NA, 3266, 3256, 7044, 6868,
NA, 3405, 3410, NA, 5567, 59.4, 56, 52.8, 52.4, 55.5, NA, NA,
53.6)), class = "data.frame", row.names = c(NA, -32L))
this is the shapiro test and the data table where I put the results.
dt_list <- by(df, df$ID, function(sub) {
results <- shapiro.test(sub$Values)
data.table(
ID = sub$ID[1],
Wilk = results$statistic,
p_value = results$p.value,
accept_H1 = results$p.value <= 0.05
)
})
dt_table <- data.table::rbindlist(dt_list)
these are a few lines of code that replace all the NAs with the mean of every ID. I want to check the data_table and for example , if for F1, the column "accept_H1" is TRUE then i want to replace the NAs for the F1 with the median , but it is FALSE then i want to replace them with the mean. Can anybody help with that?
df %>%
group_by(ID) %>%
mutate(Values = ifelse(is.na(Values), mean(Values,na.rm=TRUE), Values))
Upvotes: 1
Views: 72
Reputation: 887048
We may need to do a join or merge with the 'dt_table' and then use if/else
condition. It may be better to use na.aggregate
from zoo
as this is more easier
library(data.table)
library(zoo)
setDT(df)[dt_table[, .(ID, accept_H1)], accept_H1 := accept_H1, on = .(ID)]
df[,Values := if(first(accept_H1)) na.aggregate(Values, FUN = median) else
na.aggregate(Values), ID][, accept_H1 := NULL][]
-output
> df
ID Values
1: F1 9.600000
2: F1 9.883333
3: F1 10.200000
4: F1 9.800000
5: F1 9.900000
6: F1 9.900000
7: F1 9.900000
8: F2 1.200000
9: F2 1.200000
10: F2 1.800000
11: F2 1.500000
12: F2 1.500000
13: F2 1.600000
14: F2 1.400000
15: F2 1.457143
16: F3 3266.000000
17: F3 3256.000000
18: F3 7044.000000
19: F3 6868.000000
20: F3 3410.000000
21: F3 3405.000000
22: F3 3410.000000
23: F3 3410.000000
24: F3 5567.000000
25: F4 59.400000
26: F4 56.000000
27: F4 52.800000
28: F4 52.400000
29: F4 55.500000
30: F4 54.950000
31: F4 54.950000
32: F4 53.600000
ID Values
Or this may be done without merging also. i.e. we can match
the grouping value with the corresponding 'ID' from 'dt_table', use the index to extract the 'accept_H1' to be used in if/else
condition
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Values = if(dt_table$accept_H1[match(cur_group()$ID,
dt_table$ID)]) na.aggregate(Values, FUN = median) else
na.aggregate(Values)) %>%
ungroup
# A tibble: 32 x 2
ID Values
<chr> <dbl>
1 F1 9.6
2 F1 9.88
3 F1 10.2
4 F1 9.8
5 F1 9.9
6 F1 9.9
7 F1 9.9
8 F2 1.2
9 F2 1.2
10 F2 1.8
# … with 22 more rows
Upvotes: 1