Reputation: 33

Replace value per row with value in first column

My question is very simple. I have a data frame with various numbers in each row, more than 100 columns. First column is always a non zero number. What I want to do is replace each nonzero number in each row (excluding the first column) with the first number in the row (the value of the first column)

I would think in the lines of an ifelse and a for loop that iterates through rows but there must be a simpler vectorised way to do it...

Upvotes: 0

Answers (3)

aichao

Reputation: 7455

Another approach is to use sapply, which is more efficient than looping. Assuming your data is in a data frame df:

df[,-1] <- sapply(df[,-1], function(x) {ind <- which(x!=0); x[ind] = df[ind,1]; return(x)})

Here, we are applying the function over each and all columns of df except for the first column. In the function, x is each of these columns in turn:

First find the row indices of the column that are zeroes using which.
Set these rows in x to the corresponding values in the rows of the first column of df.
Returns the column

Note that the operations in the function are all "vectorized" over the column. That is, no looping over the rows of the column. The result from sapply is a matrix of the processed columns, which replaces all columns of df that are not the first column.

See this for an excellent review of the *apply family of functions.

Hope this helps.

Upvotes: 1

Zheyuan Li

Reputation: 73405

Suppose your data frame is dat, I have a fully-vectorized solution for you:

mat <- as.matrix(dat[, -1])
pos <- which(mat != 0)
mat[pos] <- rep(dat[[1]], times = ncol(mat))[pos]
new_dat <- "colnames<-"(cbind.data.frame(dat[1], mat), colnames(dat))

Example

set.seed(0)
dat <- "colnames<-"(cbind.data.frame(1:5, matrix(sample(0:1, 25, TRUE), 5)),
                    c("val", letters[1:5]))
#  val a b c d e
#1   1 1 0 0 1 1
#2   2 0 1 0 0 1
#3   3 0 1 0 1 0
#4   4 1 1 1 1 1
#5   5 1 1 0 0 0

My code above gives:

#  val a b c d e
#1   1 1 0 0 1 1
#2   2 0 2 0 0 2
#3   3 0 3 0 3 0
#4   4 4 4 4 4 4
#5   5 5 5 0 0 0

You want a benchmark?

set.seed(0)
n <- 2000  ## use a 2000 * 2000 matrix
dat <- "colnames<-"(cbind.data.frame(1:n, matrix(sample(0:1, n * n, TRUE), n)),
                    c("val", paste0("x",1:n)))

## have to test my solution first, as aichao's solution overwrites `dat`

## my solution
system.time({mat <- as.matrix(dat[, -1])
            pos <- which(mat != 0)
            mat[pos] <- rep(dat[[1]], times = ncol(mat))[pos]
            "colnames<-"(cbind.data.frame(dat[1], mat), colnames(dat))})
#   user  system elapsed 
#  0.352   0.056   0.410 

## solution by aichao
system.time(dat[,-1] <- sapply(dat[,-1], function(x) {ind <- which(x!=0); x[ind] = dat[ind,1]; x}))
#   user  system elapsed 
#  7.804   0.108   7.919

My solution is 20 times faster!

Upvotes: 1

MFR

Reputation: 2077

Since you're data is not that big, I suggest you use a simple loop

for (i in 1:nrow(mydata))
{
 for (j in 2:ncol(mydata)
  {

    mydata[i,j]<- ifelse(mydata[i,j]==0 ,0 ,mydata[i,1])
  }
 }

Upvotes: 1

Replace value per row with value in first column

Answers (3)

Related Questions