Reputation: 12683
I need to scale a dataframe
.
The process I need to follow is the following:
Divide all elements in a row with the max number in that row, unless that row contains number 1
I use this approach:
post_df <- df # original dataframe
for(i in 1:nrow(df)){
if (! 1 %in% df[i,]) {
post_df[i,] <- df[i,]/max(df[i,])
}
}
I was wondering if there is a faster approach that will cut down some seconds because I run this in a big dataframe 86000 rows *500 cols
.
E.g
Row 1: Divide all elements with 0.7
Row 2: Divide all elements with 0.4
Row 3: Ignore
Row 4: Ignore
Row 5: Ignore
Upvotes: 5
Views: 1050
Reputation: 6222
Example data: Only the first two rows have 1's in them.
df <- iris[1:5, 1:4]
df[2,3] <- 1
df[1,1] <- 1
df
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1 1.0 3.5 1.4 0.2
# 2 4.9 3.0 1.0 0.2
# 3 4.7 3.2 1.3 0.2
# 4 4.6 3.1 1.5 0.2
# 5 5.0 3.6 1.4 0.2
Compute
res <- sapply(1:nrow(df), function(x) if(any(df[x, ] == 1)) {
df[x, ]
} else {
df[x, ]/ max(df[x, ])
}
)
t(res)
# Sepal.Length Sepal.Width Petal.Length Petal.Width
# 1 3.5 1.4 0.2
# 4.9 3 1 0.2
# 1 0.6808511 0.2765957 0.04255319
# 1 0.673913 0.326087 0.04347826
# 1 0.72 0.28 0.04
Except the rows with 1's, rest were divided by the max of that row.
Upvotes: 1
Reputation: 887951
Based on the description, we need to only scale
those rows that doesn't have 1. Create a logical index ('i1') based on rowSums
and then subset the dataset using 'i1', get the max
of each row with pmax
, divide with the subset and assign it back to the subset
i1 <- !rowSums(df==1)>0
df[i1,] <- df[i1,]/do.call(pmax, df[i1,])
set.seed(24)
df <- as.data.frame(matrix(sample(1:8, 10*5, replace = TRUE), ncol=5))
Upvotes: 3
Reputation: 50738
How about the following
set.seed(2017)
# Sample data
mat <- matrix(sample(5*10), ncol = 5)
mat;
# [,1] [,2] [,3] [,4] [,5]
# [1,] 47 49 42 46 11
# [2,] 27 1 41 38 37
# [3,] 23 39 40 28 13
# [4,] 14 16 21 4 43
# [5,] 36 18 6 33 9
# [6,] 35 50 48 10 29
# [7,] 2 45 15 22 7
# [8,] 19 24 8 34 5
# [9,] 20 31 44 3 25
#[10,] 12 26 32 30 17
# Scale by row length if row does not contain 1
mat.scaled <- t(apply(mat, 1, function(x) if (1 %in% x) x else x / length(x)))
mat.scaled;
# [,1] [,2] [,3] [,4] [,5]
# [1,] 9.4 9.8 8.4 9.2 2.2
# [2,] 27.0 1.0 41.0 38.0 37.0
# [3,] 4.6 7.8 8.0 5.6 2.6
# [4,] 2.8 3.2 4.2 0.8 8.6
# [5,] 7.2 3.6 1.2 6.6 1.8
# [6,] 7.0 10.0 9.6 2.0 5.8
# [7,] 0.4 9.0 3.0 4.4 1.4
# [8,] 3.8 4.8 1.6 6.8 1.0
# [9,] 4.0 6.2 8.8 0.6 5.0
#[10,] 2.4 5.2 6.4 6.0 3.4
Upvotes: 1