Reputation: 12683

Divide all elements in row with the max value in row - Faster approach

I need to scale a dataframe.
The process I need to follow is the following:

Divide all elements in a row with the max number in that row, unless that row contains number 1

I use this approach:

post_df <- df # original dataframe
for(i in 1:nrow(df)){
    if (! 1 %in% df[i,]) {
        post_df[i,] <- df[i,]/max(df[i,])
    }
}

I was wondering if there is a faster approach that will cut down some seconds because I run this in a big dataframe 86000 rows *500 cols .

E.g

5 rows, 5 cols

Row 1: Divide all elements with 0.7
Row 2: Divide all elements with 0.4
Row 3: Ignore
Row 4: Ignore
Row 5: Ignore

Upvotes: 5

Answers (3)

kangaroo_cliff

Reputation: 6222

Example data: Only the first two rows have 1's in them.

df <- iris[1:5, 1:4]
df[2,3] <- 1
df[1,1] <- 1
df

# Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1          1.0         3.5          1.4         0.2
# 2          4.9         3.0          1.0         0.2
# 3          4.7         3.2          1.3         0.2
# 4          4.6         3.1          1.5         0.2
# 5          5.0         3.6          1.4         0.2

Compute

res <- sapply(1:nrow(df), function(x) if(any(df[x, ] == 1)) {
  df[x, ]
} else {
  df[x, ]/ max(df[x, ])
 }
)

t(res)


# Sepal.Length Sepal.Width Petal.Length Petal.Width
#  1            3.5         1.4          0.2
#  4.9          3           1            0.2
#  1            0.6808511   0.2765957    0.04255319
#  1            0.673913    0.326087     0.04347826
#  1            0.72        0.28         0.04

Except the rows with 1's, rest were divided by the max of that row.

Upvotes: 1

akrun

Reputation: 887951

Based on the description, we need to only scale those rows that doesn't have 1. Create a logical index ('i1') based on rowSums and then subset the dataset using 'i1', get the max of each row with pmax, divide with the subset and assign it back to the subset

i1 <- !rowSums(df==1)>0
df[i1,] <- df[i1,]/do.call(pmax, df[i1,])

data

set.seed(24)
df <- as.data.frame(matrix(sample(1:8, 10*5, replace = TRUE), ncol=5))

Upvotes: 3

Maurits Evers

Reputation: 50738

How about the following

set.seed(2017)
# Sample data
mat <- matrix(sample(5*10), ncol = 5)
mat;
#      [,1] [,2] [,3] [,4] [,5]
# [1,]   47   49   42   46   11
# [2,]   27    1   41   38   37
# [3,]   23   39   40   28   13
# [4,]   14   16   21    4   43
# [5,]   36   18    6   33    9
# [6,]   35   50   48   10   29
# [7,]    2   45   15   22    7
# [8,]   19   24    8   34    5
# [9,]   20   31   44    3   25
#[10,]   12   26   32   30   17


# Scale by row length if row does not contain 1
mat.scaled <- t(apply(mat, 1, function(x) if (1 %in% x) x else x / length(x)))
mat.scaled;
#     [,1] [,2] [,3] [,4] [,5]
# [1,]  9.4  9.8  8.4  9.2  2.2
# [2,] 27.0  1.0 41.0 38.0 37.0
# [3,]  4.6  7.8  8.0  5.6  2.6
# [4,]  2.8  3.2  4.2  0.8  8.6
# [5,]  7.2  3.6  1.2  6.6  1.8
# [6,]  7.0 10.0  9.6  2.0  5.8
# [7,]  0.4  9.0  3.0  4.4  1.4
# [8,]  3.8  4.8  1.6  6.8  1.0
# [9,]  4.0  6.2  8.8  0.6  5.0
#[10,]  2.4  5.2  6.4  6.0  3.4

Upvotes: 1

Divide all elements in row with the max value in row - Faster approach

5 rows, 5 cols

Answers (3)

data

Related Questions