Xizam
Xizam

Reputation: 718

Why won't my column name change work in R?

This is part of a script im writing to merge the collumns more fully after using merge(). If both datasets have a column with the same name merge() gives you columns column.x and column.y.
I have written a script to put this data together and to drop the unneeded columns (which would be column.y and column.x_error, a column i've added to give warnings in case dat$column.x != dat$column.y).
I also want to rename column.x to column, to decrease unneeded manual actions in my dataset. I have not managed to rename column.x to column, see the code for more info.

dat is obtained from doing a dat = merge(data1,data2, by= "ID", all.x=TRUE)

#obtain a list of double columns
dubbelkol = cbind()
sorted = sort(names(dat))
for(i in as.numeric(1:length(names(dat)))) {
  if(grepl(".x",sorted[i])){
    if (grepl(".y", sorted[i+1]) && (sub(".x","",sorted[i])==sub(".y","",sorted[i+1]))){
      dubbelkol = cbind(dubbelkol,sorted[i],sorted[i+1])
    } 
  }  
}

#Check data, fill in NA in column.x from column.y if poss
temp = cbind()
for (p in as.numeric(1:(length(dubbelkol)-1))){
  if(grepl(".x",dubbelkol[p])){
    dat[dubbelkol[p]][is.na(dat[dubbelkol[p]])] = dat[dubbelkol[p+1]][is.na(dat[dubbelkol[p]])]
    temp = (dat[dubbelkol[p]] != dat[dubbelkol[p+1]])
    colnames(temp) = (paste(dubbelkol[p],"_error", sep=""))
    dat[colnames(temp)] = temp
  }
}
#If every value in "column.x_error" is TRUE or NA, delete "column.y" and "column.x_error"
#Rename "column.x" to "column"
#from here until next comment everything works
droplist= c()
for (k in as.numeric(1:length(names(dat)))) {
  if (grepl(".x_error",colnames(dat[k]))) {
    if (all(dat[k]==FALSE, na.rm = TRUE)) {
      droplist = c(droplist,colnames(dat[k]), sub(".x_error",".y",colnames(dat[k])))
#the next line doesnt work, it's supposed to turn the .x column back to "" before the .y     en .y_error columns are dropped.
      colnames(dat[sub(".x_error",".x",colnames(dat[k]))])= paste(sub(".x_error","",colnames(dat[k])))
    }
  }
}
dat = dat[,!names(dat) %in% droplist]

paste(sub(".x_error","",colnames(dat[k]))) will give me "BNR" just fine, but the colnames(...) = ... won't change the column name in dat.

Any idea what's going wrong?

data1
+----+-------+
| ID | BNR   | 
+----+-------+
|  1 | 123   | 
|  2 | 234   |
|  3 | NA    | 
|  4 | 456   | 
|  5 | 677   |
|  6 | NA    | 
+----+-------+

data2
+----+-------+
| ID | BNR   | 
+----+-------+
|  1 | 123   | 
|  2 | 234   |
|  3 | 345   | 
|  4 | 456   | 
|  5 | 677   |
|  6 | NA    | 
+----+-------+
dat
+----+-------+-------+-----------+
| ID | BNR.x | BNR.y |BNR.x_error|
+----+-------+-------+-----------+
|  1 | 123   | NA    |FALSE      |
|  2 | 234   | 234   |FALSE      |
|  3 | NA    | 345   |FALSE      |
|  4 | 456   | 456   |FALSE      |
|  5 | 677   | 677   |FALSE      |
|  6 | NA    | NA    |NA         |
+----+-------+-------+-----------+

desired output
+----+-------+
| ID | BNR   | 
+----+-------+
|  1 | 123   |
|  2 | 234   | 
|  3 | 345   | 
|  4 | 456   | 
|  5 | 677   | 
|  6 | NA    | 
+----+-------+

Upvotes: 2

Views: 4990

Answers (1)

Arun
Arun

Reputation: 118789

I suggest replacing:

sub(".x_error",".x",colnames(dat[k]))]

with:

sub("\\.x_error", "\\.x", colnames(dat[k]))] 

if you wish to replace an actual .. You have to escape . with \\.. A . in regex means any character.

Even better, since you are replacing . with . why not just say:

sub("x_error", "x", colnames(dat[k]))] 

(or) if there is no other _error other than x_error, simply:

sub("_error", "", colnames(dat[k]))] 

Edit: The problem seems to be that your data format seems to be loading additional columns on the left and the right. You can select the columns you want first and then merge.

d1 <- read.table(textConnection("| ID | BNR   | 
|  1 | 123   | 
|  2 | 234   |
|  3 | NA    | 
|  4 | 456   | 
|  5 | 677   |
|  6 | NA    |"), sep = "|", header = TRUE, stringsAsFactors = FALSE)[,2:3]

d1$BNR <- as.numeric(d1$BNR)

d2 <- read.table(textConnection("|  1 | 123   | 
|  2 | 234   |
|  3 | 345   | 
|  4 | 456   | 
|  5 | 677   |
|  6 | NA    |"), header = FALSE, sep = "|", stringsAsFactors = FALSE)[,2:3]

names(d2) <- c("ID", "BNR")
d2$BNR <- as.numeric(d2$BNR)

# > d1
#   ID BNR
# 1  1 123
# 2  2 234
# 3  3  NA
# 4  4 456
# 5  5 677
# 6  6  NA

# > d2
#   ID BNR
# 1  1 123
# 2  2 234
# 3  3 345
# 4  4 456
# 5  5 677
# 6  6  NA

dat <- merge(d1, d2, by="ID", all=T)
> dat

#   ID BNR.x BNR.y
# 1  1   123   123
# 2  2   234   234
# 3  3    NA   345
# 4  4   456   456
# 5  5   677   677
# 6  6    NA    NA

# replace all NA values in x from y
dat$BNR.x <- ifelse(is.na(dat$BNR.x), dat$BNR.y, dat$BNR.x)

# now remove y
dat$BNR.y <- null

Upvotes: 2

Related Questions