Reputation:
I have a dataframe, titled gen
, which is a data frame made up of A's, C's, G's, T's, and 0's. I would like to replace the A with a 1, the C with a 2, the G with a 3, and the T with a 4. When I try using the code gen1[gen1 == "A"] = 1
, I get the error message:
Warning messages:
1: In `[<-.factor`(`*tmp*`, thisvar, value = "1") :
invalid factor level, NAs generated
The resulting data frame has all of the A's replaced, but there are NA's instead of 1's.
Does anyone know how to do this correctly?
Thanks
Upvotes: 1
Views: 2175
Reputation: 1892
You can do this by setting argument stringAsFactors = False
while making the Data Frame. By default it is true.
Example Code:
d <- data.frame(a=c('A','C','G','T','0'),b=c('C','A','G','A','0'), stringsAsFactors = FALSE)
> d
a b
1 A C
2 C A
3 G G
4 T A
5 0 0
> d[d=='A']<- '1'
> d
a b
1 1 C
2 C 1
3 G G
4 T 1
5 0 0
Upvotes: 0
Reputation: 121568
You can use coerce your column factors to integer using as.integer
:
Using sapply
:
sapply(gen1,as.integer)
and colwise
from plyr
:
library(plyr)
colwise(as.integer)(gen1)
For example, I generate first a data.frame of A,B,C and D:
set.seed(1)
gen1 <- as.data.frame(matrix(sample(LETTERS[1:4], 4 * 5, rep = TRUE), ncol = 4))
## V1 V2 V3 V4
## 1 B D A B
## 2 B D A C
## 3 C C C D
## 4 D C B B
## 5 A A D D
library(plyr)
colwise(as.integer)(gen1)
## V1 V2 V3 V4
## 1 2 3 1 1
## 2 2 3 1 2
## 3 3 2 3 3
## 4 4 2 2 1
## 5 1 1 4 3
sapply(gen1, as.integer)
## V1 V2 V3 V4
## [1,] 2 3 1 1
## [2,] 2 3 1 2
## [3,] 3 2 3 3
## [4,] 4 2 2 1
## [5,] 1 1 4 3
The warning messages is explicit , invalid factor level, NAs generated
.
You get the error because you try to modify a factor value with a value that don't belong to levels set, So it will replaced by NA. I will reproduce the error :
h <- data.frame(xx = factor(c("A","B")) )
h[h == "A"] <- "C" ## C don't belong to levels of xx
Warning message:
In `[<-.factor`(`*tmp*`, thisvar, value = "C") :
invalid factor level, NA generated
Upvotes: 1