Reputation: 351
I am having trouble running commands in new version of R (4.0.1; Platform: x86_64-w64-mingw32/x64 (64-bit)) and RStudio (Version 1.3.959) which worked well in the older version of R.
Let's say, I have a table named Check with more than 10,000 rows and more than 100 variables (categorical and numeric).
If I try to invoke the droplevels command, I get the below message.
Check <- droplevels(Check)
Error in .shallow(x, cols = cols, retain.key = TRUE) :
can't set ALTREP truelength
However, the below works
Check <- rapply(Check, f = droplevels, classes = "factor", how = "replace")
When I try to replace the NA's in a categorical variable by defining a new level and replacing it for NA's, I get the below message:
levels(Check$A) <- c(levels(Check$A), 'unknown.')
# Check$A <- factor(Check$A, levels=c(levels(Check$A), 'unknown.'))
Check$A[is.na(Check$A)] <- 'unknown.'
Error in setalloccol(newx) : can't set ALTREP truelength
When I try to open the table, I get the below message:
View(Check)
Error in view: can't set ALTREP truelength
I do not understand what has seriously gone wrong here. Any idea please?
I tried playing with
library(tidyverse)
Check <- data.frame(col1 = c(NA, letters[1:10]), col2 = c(NA, NA, 1:8, NA),
col3 = c(NA, letters[1:5], NA, NA, NA, NA, NA))
Test <- Check
Test <- droplevels(Test)
str(Test)
Test2 <- Test[6:11,]
Test2 <- Test2 %>% mutate_if(sapply(Test2, is.character), as.factor)
Test2 <- droplevels(Test2)
The above works fine and using dput(Test2)
yields
structure(list(col1 = structure(c(NA, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L), .Label = c("a", "b", "c", "d", "e", "f", "g",
"h", "i", "j"), class = "factor"), col2 = c(NA, NA, 1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, NA), col3 = structure(c(6L, 1L, 2L, 3L, 4L,
5L, 6L, 6L, 6L, 6L, 6L), .Label = c("a", "b", "c", "d", "e",
"unknown."), class = "factor")), row.names = c(NA, -11L), class = "data.frame")
However, for my data, I get something like this in the end using dput, though I am not using data.table.
row.names = c(NA,
-5L), .internal.selfref = <pointer: 0x0000000004f81ef0>, class = c("data.table",
"data.frame"))
I am trying to imitate my data and shall pose it, when I am successful in doing it asap.
Upvotes: 1
Views: 2889
Reputation: 351
The below example runs fine without any problem:
library(tidyverse)
library(data.table)
Check <- data.frame(col1 = c(NA, letters[1:10]), col2 = c(NA, NA, 1:8, NA),
col3 = c(NA, letters[1:5], NA, NA, NA, NA, NA))
Test1 <- Check
dput(Test1)
Test2 <- as.data.table(Check) # Convert to data.table
dput(Test2)
Test1 <- droplevels(Test1)
str(Test1)
Test1 <- Test1 %>% dplyr::mutate_if(sapply(Test1, is.character), as.factor)
Test1 <- Test1 %>% dplyr::mutate_if(sapply(Test1, is.integer), as.numeric)
str(Test1)
Test1 <- droplevels(Test1)
Test2 <- droplevels(Test2)
str(Test2)
Test2 <- Test2 %>% dplyr::mutate_if(sapply(Test2, is.character), as.factor)
Test2 <- Test2 %>% dplyr::mutate_if(sapply(Test2, is.integer), as.numeric)
str(Test2)
Test2 <- droplevels(Test2)
Now consider this example:
library(tidyverse)
library(data.table)
Check1 <- data.frame(col1 = c(NA, letters[1:200]), col2 = c(NA, NA, 1:198, NA),
col3 = c(NA, letters[1:195], NA, NA, NA, NA, NA),
col4 = c(NA, NA, letters[1:199]), col5 = c(NA, letters[7:206]),
col6 = c(NA, NA, letters[1:198], NA),
col7 = c(NA, letters[1:197], NA, NA, NA),
col8 = c(NA, letters[4:203]),
col9 = c(NA, letters[6:205]),
col10 = c(letters[1:200], NA),
col11 = c(NA, NA, letters[1:197], NA, NA),
col12 = c(NA, letters[2:201]),
col13 = c(NA, NA, NA, NA, NA, letters[1:196]) )
Check2 <- data.frame(replicate(100,sample(0:1000,201,rep=TRUE)))
Check <- cbind(Check1, Check2)
Test1 <- Check
dput(Test1)
# dput gives ,row.names = c(NA, -201L), class = "data.frame")
Test2 <- as.data.table(Check)
dput(Test2)
# dput gives row.names = c(NA, -201L), .internal.selfref = <pointer: 0x00000000052e1ef0>, class # = c("data.table", "data.frame"))
# The below block runs without any problem since Test1 is of class data.frame
Test1 <- droplevels(Test1)
str(Test1)
Test1 <- Test1 %>% dplyr::mutate_if(sapply(Test1, is.character), as.factor)
Test1 <- droplevels(Test1)
Test1 <- Test1 %>% dplyr::mutate_if(sapply(Test1, is.integer), as.numeric)
str(Test1)
Test1 <- droplevels(Test1)
# The below block gives problem since Test2 is of class = c("data.table", "data.frame")
Test2 <- droplevels(Test2)
str(Test2)
Test2 <- Test2 %>% dplyr::mutate_if(sapply(Test2, is.character), as.factor)
Test2 <- droplevels(Test2)
Test2 <- Test2 %>% dplyr::mutate_if(sapply(Test2, is.integer), as.numeric)
str(Test2)
Test2 <- droplevels(Test2)
I get the following error message running the first 4 lines:
Error in in .shallow(x, cols = cols, retain.key = TRUE) : can't set ALTREP truelength
If I try to open Test2 data frame, I get the following message
View(Test2)
Error in View : can't set ALTREP truelength
If I, however, delete Test2 using
rm(Test2)
and run the following, I do not get any error:
Test2 <- as.data.frame(Test2)
str(Test2)
Test2 <- Test2 %>% dplyr::mutate_if(sapply(Test2, is.character), as.factor)
Test2 <- droplevels(Test2)
Test2 <- Test2 %>% dplyr::mutate_if(sapply(Test2, is.integer), as.numeric)
str(Test2)
Test2 <- droplevels(Test2)
Does size restriction play a role in data.table, since it seems to works for small data and hesitates for the example above.
Does it mean each time it must be ensured that the data is saved as a data frame with class = "data.frame"?
When do such data frames get converted automatically to data.table, because in my real data set though data.table library is not loaded, despite which, the data is saved as a data.table?
Any explanation please? I have R 4.0.2 and swiped the space clean and reinstalled all packages and dependencies fresh, and RStudio version is 1.3.959 .
Upvotes: 4