Reputation: 1499
My data looks like:
SNP A1 A2 EFF FRQ
rs12565286 C G -0.00225985777786465 .04354
rs11804171 A T -0.00530020318295282 .04485
rs3094315 A G -0.0042551489236695 .8364
rs12562034 A G -0.00911972489527125 .09763
rs12124819 A G 0.0250148724382224 .
rs2980319 A T 0.0178927256033542 .1306
rs4040617 A G -0.0173263263037023 rabbit
I would like to delete the rows that contain "." and that contain "rabbit". Is there a way to keep the numbers? The reason I want to do this is because I'm using a large data file for a manipulation and I am getting a 1 - gwas.data$FRQ[i] : non-numeric argument to binary operator
error. Am I right in assuming the error is because some data isn't a number? I haven't even checked to see if the data has non-numeric values in the column because it's 3 million lines long.
Upvotes: 1
Views: 1219
Reputation: 726
Example data.frame:
df <- data.frame(a=1:10, b=1:10, FRQ=c(rnorm(8), '.', 'rabbit'), stringsAsFactors=FALSE)
To check the class of all your columns try: lapply(df, class)
If the FRQ column is character, you can convert it to numeric by removing all non-numerics, then convert to numeric. Like this:
library(stringr)
df <- df[!str_detect(df$FRQ, '([A-Za-z])'), ]
df <- df[!str_detect(df$FRQ, '\\.$'), ]
df$FRQ <- as.numeric(df$FRQ)
Upvotes: 2