R: I have a matrix populated with random numbers, but the last column contains all of the largest values

Question

I have a matrix, with 30 rows and 4 columns, that is populated with random numbers ranging from 0 to less than/equal to 100 in each row.

The last (fourth) column contains all of the largest numbers within the 0 to 100 range (i.e. 99, 98, etc.). I would like to make it so that the numbers are truly spread out in the matrix, that is, some of the largest numbers are in the first or second or third column, and not just in the fourth.

After getting some help from fellow Stackexchange contributors, here is the function I have been working with:

y <- t(replicate(30,{x <- runif(3); y <- c(x, 100 - sum(x)); sort(y/sum(y) * 100)}))

Which gives me the following matrix:

          [,1]      [,2]      [,3]     [,4]
 [1,] 0.106034080 0.8035997 0.9161168 98.17425
 [2,] 0.013372771 0.4416418 0.5053132 99.03967
 [3,] 0.477705091 0.7248394 0.7254909 98.07196
 [4,] 0.080764396 0.3859276 0.3968179 99.13649
 [5,] 0.699455094 0.7579722 0.7875055 97.75507
 [6,] 0.372264415 0.3772728 0.6589921 98.59147
 [7,] 0.114169686 0.2029357 0.4566677 99.22623
 [8,] 0.478831699 0.7960539 0.9740272 97.75109
 [9,] 0.032629269 0.2068012 0.6521174 99.10845
[10,] 0.349337576 0.3886265 0.9551120 98.30692
[11,] 0.437791360 0.4712769 0.5487138 98.54222
[12,] 0.137317329 0.6110274 0.9769346 98.27472
[13,] 0.314229467 0.4457654 0.8379763 98.40203
[14,] 0.038342032 0.3210214 0.4811107 99.15953
[15,] 0.063934371 0.5945270 0.7108552 98.63068
[16,] 0.176338974 0.3944489 0.6910675 98.73814
[17,] 0.174552999 0.6164661 0.8115936 98.39739
[18,] 0.358101562 0.6727262 0.8580072 98.11117
[19,] 0.097051604 0.1425019 0.8777943 98.88265
[20,] 0.287638218 0.5643593 0.7917288 98.35627
[21,] 0.060074348 0.1993140 0.8502848 98.89033
[22,] 0.136792340 0.1833376 0.2811030 99.39877
[23,] 0.274043764 0.5373264 0.7441432 98.44449
[24,] 0.040409521 0.5162444 0.5339423 98.90940
[25,] 0.481989448 0.5592237 0.9305969 98.02819
[26,] 0.008124932 0.6241172 0.8474349 98.52032
[27,] 0.134700798 0.3691278 0.4193065 99.07686
[28,] 0.228018480 0.5125451 0.9561445 98.30329
[29,] 0.155826657 0.6548732 0.7902131 98.39909
[30,] 0.080001684 0.4109359 0.7645531 98.74451

Is there a way I can get it so that the numbers in this matrix are truly dispersed so that not all of the largest numbers are confined to the fourth column?

Shree · Accepted Answer

From your earlier questions on this site, I believe you want the row sums to be 100. If that is the case you can do the following -

apply(y, 1, sample) %>% t()

             [,1]       [,2]       [,3]        [,4]
 [1,] 98.45703803  0.5044549  0.7077342  0.33077286
 [2,]  0.43464717  0.8476126  0.7323841 97.98535613
 [3,]  0.49888670 98.6968386  0.7889767  0.01529801
 [4,]  0.18572028 98.8084679  0.7753605  0.23045127
 [5,]  0.26143714  0.6571831  0.8050750 98.27630478
 [6,]  0.99640796  0.7799081 97.6837717  0.53991230
 [7,] 98.78978531  0.4819841  0.1272817  0.60094890
 [8,]  0.78214576  0.9553001  0.2729379 97.98961630
 [9,] 98.13567866  0.9543617  0.5649977  0.34496192
[10,]  0.32951068 98.8431607  0.1326318  0.69469690
[11,]  0.13029270 99.0047771  0.3216674  0.54326273
[12,]  0.15043569  0.4000828 98.6757551  0.77372646
[13,]  0.45297697 98.5430059  0.7859616  0.21805559
[14,] 97.47082516  0.9589021  0.7300726  0.84020013
[15,] 97.50361108  0.5948120  0.9876713  0.91390557
[16,]  0.86724965 98.3732842  0.5026257  0.25684039
[17,]  0.75680131  0.8280581  0.4436990 97.97144160
[18,]  0.15198919  0.1043612 99.5793600  0.16428958
[19,] 98.65227018  0.4529603  0.4508067  0.44396285
[20,]  0.20336426  0.8484132 98.7985358  0.14968676
[21,]  0.25826836 99.0934157  0.6310231  0.01729282
[22,] 98.27614706  0.7532277  0.3868047  0.58382045
[23,]  0.86299051  0.9929164 97.5336993  0.61039375
[24,]  0.07155582  0.9499954  0.6848183 98.29363043
[25,]  0.36991300  0.7233306  0.3723177 98.53443872
[26,]  0.03545737  0.7313207  0.8334232 98.39979873
[27,]  0.38340609  0.4898682 98.3565145  0.77021122
[28,]  0.72959183  0.5986000  0.1162227 98.55558540
[29,] 97.61277655  0.8022139  0.7579463  0.82706325
[30,]  0.80788628  0.1048696 98.1646357  0.92260843

OR, if acceptable, you can simply modify the current code you are using to generate the matrix -

y <- t(replicate(30,{x <- runif(3); y <- c(x, 100 - sum(x)); sample(y/sum(y) * 100)}))

y
              [,1]        [,2]        [,3]        [,4]
 [1,] 3.388508e-01  0.11505273 99.07052567  0.47557081
 [2,] 9.782913e-01 97.67516922  0.93676869  0.40977080
 [3,] 7.118235e-01 98.57114227  0.61125057  0.10578368
 [4,] 4.114222e-01  0.71719168  0.57760052 98.29378560
 [5,] 9.933095e+01  0.02851812  0.48623365  0.15429983
 [6,] 1.178631e-01  0.52041776 98.87709291  0.48462625
 [7,] 2.934292e-01  0.65442844  0.54952687 98.50261552
 [8,] 9.894548e+01  0.37970274  0.51812253  0.15669579
 [9,] 9.866654e+01  0.57343925  0.58184710  0.17817812
[10,] 4.032940e-01 98.51693576  0.72129771  0.35847251
[11,] 9.781653e+01  0.61351868  0.74988068  0.82007274
[12,] 9.162155e-01  0.59539127  0.30124899 98.18714421
[13,] 6.278136e-01  0.02925863 98.46212355  0.88080426
[14,] 7.046555e-01  0.52923678  0.65325927 98.11284847
[15,] 3.208775e-01 98.31748802  0.61381891  0.74781558
[16,] 9.828647e+01  0.69667227  0.71976278  0.29709852
[17,] 3.696794e-04  0.69169085  0.13164316 99.17629630
[18,] 1.911561e-01  0.34213257 98.63355941  0.83315194
[19,] 1.784691e-01  0.11677341  0.35504916 99.34970828
[20,] 9.953998e-01  0.08634864  0.62682837 98.29142318
[21,] 8.658657e-01  0.20322069 98.67518541  0.25572820
[22,] 6.421388e-01 97.80948669  0.90228079  0.64609376
[23,] 9.843660e+01  0.84248163  0.05995064  0.66096543
[24,] 8.971966e-01  0.26555262  0.18558822 98.65166255
[25,] 2.468929e-01  0.09061412 99.09220658  0.57028645
[26,] 5.551374e-01  0.56177760 98.15917879  0.72390625
[27,] 9.812421e+01  0.62237186  0.52028315  0.73313957
[28,] 2.610207e-01 98.73290082  0.66234590  0.34373259
[29,] 5.671531e-01  0.34175286 99.05314043  0.03795362
[30,] 4.771366e-02  0.69462738 98.65743305  0.60022591

R: I have a matrix populated with random numbers, but the last column contains all of the largest values

Answers (1)

Related Questions