David
David

Reputation: 427

Replace rows by index

In the following example:

library(data.table)
df1 <- data.table("1A"=c(0,0,0,0),"1B"=c(4:3),"2A"=c(0,0,0,0), "2B"=c(4:3))
df2 <- data.table("1A"=c(0,0),"1B"=c(1:2),"2A"=c(0,0), "2B"=c(1:2))

df1
#    1A 1B 2A 2B
# 1:  0  4  0  4
# 2:  0  3  0  3
# 3:  0  4  0  4
# 4:  0  3  0  3

df2
#    1A 1B 2A 2B
# 1:  0  1  0  1
# 2:  0  2  0  2

indx = c(1,3)
indx
# [1] 1 3

df1[indx,] <- df2
df1
#    1A 1B 2A 2B
# 1:  0  1  0  1
# 2:  0  3  0  3
# 3:  0  2  0  2
# 4:  0  3  0  3

I successfully replace rows 1 and 3 in df1 with df2. Replicating the same exercise in my real data, I encounter the error:

Can't assign to the same column twice in the same query (duplicates detected).

in this expression:

Z4[positionpdis,] <- ZpdisRow2

The objects have the following attributes:

is.data.table(ZpdisRow2)
# [1] TRUE
is.data.table(Z4)
# [1] TRUE
dim(Z4)
# [1] 7968 7968
dim(Z4[positionpdis,])
# [1]   48 7968
dim(ZpdisRow2)
# [1]   48 7968
str(positionpdis)
# int [1:48] 91 257 423 589 755 921 1087 1253 1419 1585 ...
> length(unique(positionpdis))
# [1] 48

What can be the source of the error?

Upvotes: 4

Views: 1137

Answers (1)

akrun
akrun

Reputation: 887301

I am guessing that we might have some column names duplicated in the original dataset. For example, if we change the 3rd column name as the same as the first one, we get an error.

colnames(df1)[3] <- '1A'
df1[indx,] <- df2

Error in [<-.data.table(*tmp*, indx, , value = list(1A = c(0, 0), : Can't assign to the same column twice in the same query (duplicates detected).

We can make that column names unique with make.unique which is a convenient function for this type of cases without having to look each and every column name for duplicates.

 colnames(df1) <- make.unique(colnames(df1)) 
 df1[indx,] <- df2
 df1
 #  1A 1B 1A.1 2B
 #1:  0  1    0  1
 #2:  0  3    0  3
 #3:  0  2    0  2
 #4:  0  3    0  3

Another option that should also work with duplicate column names is set. It is very efficient as the overhead in [.data.table is avoided. Here, we loop through the column index (seq_along(df1)), and based on the row (i) and column (j) index, we set the values in 'df1' with the values of 'df2'.

 for(j in seq_along(df1)){
           set(df1, i= as.integer(indx), j=j, df2[[j]])
  }
 df1
#   1A 1B 1A 2B
#1:  0  1  0  1
#2:  0  3  0  3
#3:  0  2  0  2
#4:  0  3  0  3

Upvotes: 6

Related Questions