Reputation: 157
Here is an example for the dataset (d):
rs3 rs4 rs5 rs6
1 0 0 0
1 0 1 0
0 0 0 0
2 0 1 0
0 0 0 0
0 2 0 1
0 2 NA 1
0 2 2 1
NA 1 2 1
To check the frequency of the SNP genotype (0,1,2), we can use the table command
table (d$rs3)
The output would be
0 1 2
5 2 1
Here we want to recode the variables if the genotype 2's frequency is <3, the recoded output should be
rs3 rs4 rs5 rs6
1 0 0 0
1 0 1 0
0 0 0 0
1 0 1 0
0 0 0 0
0 2 0 1
0 2 NA 1
0 2 1 1
NA 1 1 1
I have 70000SNPs that need to check and recode. How to use the for loop or other method to do that in R?
Upvotes: 2
Views: 367
Reputation: 92282
Here's another possible (vectorized) solution
indx <- colSums(d == 2, na.rm = TRUE) < 3 # Select columns by condition
d[indx][d[indx] == 2] <- 1 # Inset 1 when the subset by condition equals 2
d
# rs3 rs4 rs5 rs6
# 1 1 0 0 0
# 2 1 0 1 0
# 3 0 0 0 0
# 4 1 0 1 0
# 5 0 0 0 0
# 6 0 2 0 1
# 7 0 2 NA 1
# 8 0 2 1 1
# 9 NA 1 1 1
Upvotes: 3
Reputation: 887078
We can try
d[] <- lapply(d, function(x)
if(sum(x==2, na.rm=TRUE) < 3) replace(x, x==2, 1) else x)
d
# rs3 rs4 rs5 rs6
#1 1 0 0 0
#2 1 0 1 0
#3 0 0 0 0
#4 1 0 1 0
#5 0 0 0 0
#6 0 2 0 1
#7 0 2 NA 1
#8 0 2 1 1
#9 NA 1 1 1
Or the same methodology can be used in dplyr
library(dplyr)
d %>%
mutate_each(funs(if(sum(.==2, na.rm=TRUE) <3)
replace(., .==2, 1) else .))
Upvotes: 2