Reputation: 1499
I have one data frame that looks like (gwas.data):
SNP CHR BP A1 A2 zscore P CEUmaf MAF
1 rs1000000 12 125456933 A G 1.441 0.1496 0.3729 0.2401
563090 rs10000010 4 21227772 T C 0.068 0.9455 0.575 0.4934
563091 rs10000023 4 95952929 T G 1.217 0.2236 0.5917 0.3852
563092 rs1000003 3 99825597 A G -0.306 0.7597 0.875 0.1794
563093 rs10000033 4 139819348 T C 1.050 0.2935 0.4917 0.4789
2 rs10000037 4 38600725 A G 0.072 0.9428 0.2833 0.2296
I have another that looks like (correct orientation):
CHR SNP A1 A2 MAF NCHROBS
6952148 12 rs1000000 A G 0.2401 758
2272221 4 rs10000010 C T 0.4934 758
2524810 4 rs10000023 G T 0.3852 758
1838654 3 rs1000003 G A 0.1794 758
2675630 4 rs10000033 C T 0.4789 758
2338861 4 rs10000037 A G 0.2296 758
I'm trying to right a program that takes replaces the gwas.data$MAF with (1-MAF) if A1 and A2 and switched between the two data frames. I'm trying to use this line of code here that I am borrowing from someone else:
flip <- gwas.data$A1 == correct.orientation$A2 & gwas.data$A2 == correct.orientation$A1
dont.flip <- gwas.data$A1 == correct.orientation$A1 & gwas.data$A2 == correct.orientation$A2
for ( i in 1 : nrow ( gwas.data ) ) {
if ( flip [ i ] ) {
gwas.data$A1 [ i ] <- correct.orientation$A1 [ i ]
gwas.data$A2 [ i ] <- correct.orientation$A2 [ i ]
gwas.data$zscore [ i ] <- - gwas.data$EFF [ i ]
gwas.data$MAF [ i ] <- 1 - gwas.data$FRQ [ i ]
} else if ( dont.flip [ i ] ) {
#do nothing
} else {
stop ( "Strand Issue")
}
I'm running into the error at the first line flip <- gwas.data$A1 == correct.orientation$A2 & gwas.data$A2 == correct.orientation$A1
The error is
Error in Ops.factor(gwas.data$A1, correct.orientation$A2) : level sets of factors are different
How to fix this?
Upvotes: 0
Views: 201
Reputation: 107757
Consider forgoing the use of for
loop and use the base R merge() function of both dataframes. However, a little data management is needed: 1) temporarily convert factors to characters (or use stringAsFactors=FALSE
in read.csv()
or read.table()
) and 2) adding suffixes for repeat column names. Once calculated MAF is complete with ifelse()
, split the merged data frame and reset column names and data types to original structure:
# CONVERT FACTORS TO CHARACTER
gwas.data[, c("A1","A2")] <- sapply(gwas.data[,c("A1","A2")],as.character)
# SUFFIXING COL NAMES TO IDENTIFY IN MERGED DF
names(gwas.data) <- paste0(names(gwas.data), "_A")
# CONVERT FACTORS TO CHARACTER
correct.orientation[, c("A1","A2")] <- sapply(correct.orientation[,c("A1","A2")],as.character)
# SUFFIXING COL NAMES TO IDENTIFY IN MERGED DF
names(correct.orientation) <- paste0(names(correct.orientation ), "_B")
# MERGE DATA FRAMES (ASSUMING SNP IS UNIQUE IDENTIFIER)
comparedf <- merge(gwas.data, correct.orientation, by.x="SNP_A", by.y="SNP_B", all=TRUE)
# CALCULATE NEW MAF
comparedf$MAF_A <- ifelse(((comparedf$A1_A == comparedf$A2_B) &
(comparedf$A2_B == comparedf$A1_A)),
(1 - comparedf$MAF_A),
comparedf$MAF_A)
comparedf$zscore_A <- ifelse(((comparedf$A1_A == comparedf$A2_B) &
(comparedf$A2_B == comparedf$A1_A)),
-1 * comparedf$zscore_A,
comparedf$zscore_A)
# SPLIT MERGE BACK TO ORIGINAL STRUCTURE
newgwas.data <- comparedf[,names(gwas.data)]
# REMOVE SUFFIX
names(newgwas.data) <- gsub("_A", "", names(newgwas.data))
# RESET FACTORS
newgwas.data$A1 <- as.factor(newgwas.data$A1)
newgwas.data$A2 <- as.factor(newgwas.data$A2)
Upvotes: 1