organize numbers in columns based on whether they are the smallest or largest in their row

Question

I apologize for the confusing question title, but I am having difficulty explaining what I trying to do concisely. I am creating a matrix (30 rows, 4 columns) that is populated with random numbers. The rowsum of each row should be less than or equal to 100.

The 4 columns should be such that column #1 contains the "very small" values of each row, column #2 contains the "small" values of each row, column #3 contains the "medium" values of each row, and finally column #4 contains the "large" values of each row.

With some help from fellow Stackoverflow contributors, I have been able to mostly create this matrix, except the values in each column are not organized based on whether they are the smallest or largest values in each row.

y <- t(replicate(30, diff(c(0, sort(runif(3)), 100))))
colnames(y) <- c("very small", "small", "medium", "large")

The above code has given me the following:

            very small       small     medium    large
2019/05/ 1  0.205097109 0.070672238 0.01700667 99.70722
2019/05/ 2  0.243479179 0.040387075 0.30706882 99.40906
2019/05/ 3  0.281493845 0.307205145 0.27184281 99.13946
2019/05/ 4  0.094201030 0.500598101 0.18033218 99.22487
2019/05/ 5  0.571545601 0.137103309 0.05957493 99.23178
2019/05/ 6  0.352396976 0.025242982 0.09915150 99.52321
2019/05/ 7  0.326936497 0.311130309 0.17842621 99.18351
2019/05/ 8  0.031320460 0.246074992 0.51765481 99.20495
2019/05/ 9  0.384158575 0.005363531 0.36312765 99.24735
2019/05/ 10 0.117185787 0.157038320 0.38193820 99.34384
2019/05/ 11 0.302821585 0.013465186 0.57304693 99.11067
2019/05/ 12 0.270103380 0.027066988 0.62618468 99.07664
2019/05/ 13 0.768581860 0.059368411 0.11710833 99.05494
2019/05/ 14 0.365422636 0.429894385 0.05852176 99.14616
2019/05/ 15 0.293412367 0.212210099 0.19328758 99.30109
2019/05/ 16 0.175190870 0.404489142 0.21989102 99.20043
2019/05/ 17 0.199347789 0.308199585 0.39642498 99.09603
2019/05/ 18 0.302785775 0.027558287 0.41947684 99.25018
2019/05/ 19 0.133129124 0.186412389 0.25709532 99.42336
2019/05/ 20 0.063202850 0.010502272 0.44173144 99.48456
2019/05/ 21 0.181040903 0.539530740 0.11489523 99.16453
2019/05/ 22 0.067097938 0.375550539 0.18543176 99.37192
2019/05/ 23 0.068234308 0.431963232 0.05830656 99.44150
2019/05/ 24 0.652605844 0.210821809 0.05857287 99.07800
2019/05/ 25 0.121836021 0.278319767 0.27807419 99.32177
2019/05/ 26 0.198721305 0.471083412 0.04579872 99.28440
2019/05/ 27 0.009804903 0.625606508 0.33387186 99.03072
2019/05/ 28 0.259757704 0.066581255 0.46966620 99.20399
2019/05/ 29 0.307779151 0.289004824 0.28417942 99.11904
2019/05/ 30 0.265451489 0.363131717 0.13731079 99.23411

But as you can see, some of the values in the "small" column are actually smaller than the values in the "very small" column. Similarly, some of the values in the "medium" column are smaller than the values in the "small" column. I want it such that the "very small" column contains the smallest values in each row, the "small" column with the second-to-smallest values in each row, the "medium" column with the medium values in each row, and finally the "large" column with the largest values in each row.

Apologies for the repetition and verbosity. I am not sure if I explained what I wanted to do well. Please request clarification if necessary. Any help would be appreciated. Thanks.

Ronak Shah · Accepted Answer

Here is another way of achieving what you want. Showing it for 10 rows and 4 columns.

y <- t(replicate(10, {x <- runif(4); sort(x/sum(x))}))
colnames(y) <- c("very small", "small", "medium", "large")


y
#       very small      small    medium     large
# [1,] 0.097554134 0.22426282 0.3143340 0.3638490
# [2,] 0.120976369 0.20866048 0.2819127 0.3884505
# [3,] 0.043034420 0.18923072 0.3792956 0.3884393
# [4,] 0.201780765 0.22601204 0.2784359 0.2937713
# [5,] 0.219092518 0.24297233 0.2633450 0.2745902
# [6,] 0.065203979 0.23889589 0.2780956 0.4178045
# [7,] 0.098080232 0.19301764 0.2444150 0.4644872
# [8,] 0.005077277 0.01149552 0.1044939 0.8789333
# [9,] 0.185194316 0.21291986 0.2607769 0.3411089
#[10,] 0.059954939 0.08914909 0.3992849 0.4516111

If you are ok with negative numbers you can achieve the same with rnorm as well

t(replicate(10, {x <- rnorm(4); sort(x/sum(x))}))

If you want to keep the numbers in the same range as shown in your example you can do

y <- t(replicate(10,{x <- runif(3); y <- c(x, 100 - sum(x)); sort(y/sum(y) * 100)}))

organize numbers in columns based on whether they are the smallest or largest in their row

Answers (2)

Related Questions