Ray
Ray

Reputation: 351

Problems compiling code written in old version in new R version

I am having trouble running commands in new version of R (4.0.1; Platform: x86_64-w64-mingw32/x64 (64-bit)) and RStudio (Version 1.3.959) which worked well in the older version of R.

Let's say, I have a table named Check with more than 10,000 rows and more than 100 variables (categorical and numeric).

If I try to invoke the droplevels command, I get the below message.

Check <- droplevels(Check)
Error in .shallow(x, cols = cols, retain.key = TRUE) : 
can't set ALTREP truelength

However, the below works

Check <- rapply(Check, f = droplevels, classes = "factor", how = "replace")

When I try to replace the NA's in a categorical variable by defining a new level and replacing it for NA's, I get the below message:

levels(Check$A) <- c(levels(Check$A), 'unknown.')
# Check$A <- factor(Check$A, levels=c(levels(Check$A), 'unknown.'))
Check$A[is.na(Check$A)] <- 'unknown.'
Error in setalloccol(newx) : can't set ALTREP truelength

When I try to open the table, I get the below message:

View(Check)
Error in view: can't set ALTREP truelength

I do not understand what has seriously gone wrong here. Any idea please?

I tried playing with

library(tidyverse)
Check <- data.frame(col1 = c(NA, letters[1:10]), col2 = c(NA, NA, 1:8, NA), 
                 col3 = c(NA, letters[1:5], NA, NA, NA, NA, NA))
Test <- Check
Test <- droplevels(Test)
str(Test)
Test2 <- Test[6:11,]
Test2 <- Test2 %>% mutate_if(sapply(Test2, is.character), as.factor)
Test2 <- droplevels(Test2)

The above works fine and using dput(Test2) yields

structure(list(col1 = structure(c(NA, 1L, 2L, 3L, 4L, 5L, 6L, 
7L, 8L, 9L, 10L), .Label = c("a", "b", "c", "d", "e", "f", "g", 
"h", "i", "j"), class = "factor"), col2 = c(NA, NA, 1L, 2L, 3L, 
4L, 5L, 6L, 7L, 8L, NA), col3 = structure(c(6L, 1L, 2L, 3L, 4L, 
5L, 6L, 6L, 6L, 6L, 6L), .Label = c("a", "b", "c", "d", "e", 
"unknown."), class = "factor")), row.names = c(NA, -11L), class = "data.frame")

However, for my data, I get something like this in the end using dput, though I am not using data.table.

row.names = c(NA, 
-5L), .internal.selfref = <pointer: 0x0000000004f81ef0>, class = c("data.table", 
"data.frame"))

I am trying to imitate my data and shall pose it, when I am successful in doing it asap.

Upvotes: 1

Views: 2889

Answers (1)

Ray
Ray

Reputation: 351

The below example runs fine without any problem:

library(tidyverse)
library(data.table)

Check <- data.frame(col1 = c(NA, letters[1:10]), col2 = c(NA, NA, 1:8, NA), 
                    col3 = c(NA, letters[1:5], NA, NA, NA, NA, NA))
Test1 <- Check
dput(Test1)

Test2 <- as.data.table(Check) # Convert to data.table
dput(Test2)


Test1 <- droplevels(Test1)
str(Test1)
Test1 <- Test1 %>% dplyr::mutate_if(sapply(Test1, is.character), as.factor)
Test1 <- Test1 %>% dplyr::mutate_if(sapply(Test1, is.integer), as.numeric)
str(Test1)
Test1 <- droplevels(Test1)

Test2 <- droplevels(Test2)
str(Test2)
Test2 <- Test2 %>% dplyr::mutate_if(sapply(Test2, is.character), as.factor)
Test2 <- Test2 %>% dplyr::mutate_if(sapply(Test2, is.integer), as.numeric)
str(Test2)
Test2 <- droplevels(Test2)

Now consider this example:

library(tidyverse)
library(data.table)

Check1 <- data.frame(col1 = c(NA, letters[1:200]), col2 = c(NA, NA, 1:198, NA), 
                 col3 = c(NA, letters[1:195], NA, NA, NA, NA, NA),
                 col4 = c(NA, NA, letters[1:199]), col5 = c(NA, letters[7:206]),
                 col6 = c(NA, NA, letters[1:198], NA),
                 col7 = c(NA, letters[1:197], NA, NA, NA),
                 col8 = c(NA, letters[4:203]),
                 col9 = c(NA, letters[6:205]),
                 col10 = c(letters[1:200], NA),
                 col11 = c(NA, NA, letters[1:197], NA, NA),
                 col12 = c(NA, letters[2:201]),
                 col13 = c(NA, NA, NA, NA, NA, letters[1:196]) )

Check2 <- data.frame(replicate(100,sample(0:1000,201,rep=TRUE)))

Check <- cbind(Check1, Check2)


Test1 <- Check
dput(Test1) 
# dput gives ,row.names = c(NA, -201L), class = "data.frame")

Test2 <- as.data.table(Check)
dput(Test2)
# dput gives row.names = c(NA, -201L), .internal.selfref = <pointer: 0x00000000052e1ef0>, class # = c("data.table", "data.frame"))

# The below block runs without any problem since Test1 is of class data.frame
Test1 <- droplevels(Test1)
str(Test1)
Test1 <- Test1 %>% dplyr::mutate_if(sapply(Test1, is.character), as.factor)
Test1 <- droplevels(Test1)
Test1 <- Test1 %>% dplyr::mutate_if(sapply(Test1, is.integer), as.numeric)
str(Test1)
Test1 <- droplevels(Test1)

# The below block gives problem since Test2 is of class = c("data.table", "data.frame")
Test2 <- droplevels(Test2)
str(Test2)
Test2 <- Test2 %>% dplyr::mutate_if(sapply(Test2, is.character), as.factor)
Test2 <- droplevels(Test2)
Test2 <- Test2 %>% dplyr::mutate_if(sapply(Test2, is.integer), as.numeric)
str(Test2)
Test2 <- droplevels(Test2)

I get the following error message running the first 4 lines:

Error in in .shallow(x, cols = cols, retain.key = TRUE) : can't set ALTREP truelength

If I try to open Test2 data frame, I get the following message

View(Test2)
Error in View : can't set ALTREP truelength

If I, however, delete Test2 using

rm(Test2) 

and run the following, I do not get any error:

Test2 <- as.data.frame(Test2)
str(Test2)
Test2 <- Test2 %>% dplyr::mutate_if(sapply(Test2, is.character), as.factor)
Test2 <- droplevels(Test2)
Test2 <- Test2 %>% dplyr::mutate_if(sapply(Test2, is.integer), as.numeric)
str(Test2)
Test2 <- droplevels(Test2)
  1. Does size restriction play a role in data.table, since it seems to works for small data and hesitates for the example above.

  2. Does it mean each time it must be ensured that the data is saved as a data frame with class = "data.frame"?

  3. When do such data frames get converted automatically to data.table, because in my real data set though data.table library is not loaded, despite which, the data is saved as a data.table?

Any explanation please? I have R 4.0.2 and swiped the space clean and reinstalled all packages and dependencies fresh, and RStudio version is 1.3.959 .

Upvotes: 4

Related Questions