Reputation: 73
It is probably the simplest question, but I am really stuck. I have a data like this
> SALARY
salary X0 X1 total BR GDis BDis WOE IV
1 225 27 4 31 12.903226 0.05803456 0.16515277 -1.045832158 1.120277e-01
2 226 66 17 83 20.481928 0.14186226 0.70189926 -1.598933265 8.954618e-01
3 227 779 102 881 11.577753 1.67440461 4.21139554 -0.922336431 2.339959e+00
4 228 2953 256 3209 7.977563 6.34726163 10.56977704 -0.509975226 2.153378e+00
5 229 7349 544 7893 6.892183 15.79614822 22.46077622 -0.352004382 2.345978e+00
6 230 6007 451 6458 6.983586 12.91161551 18.62097440 -0.366161268 2.090546e+00
7 231 5477 363 5840 6.215753 11.77241854 14.98761354 -0.241464713 7.763561e-01
8 232 1372 70 1442 4.854369 2.94901556 2.89017341 0.020154903 1.185958e-03
9 233 496 26 522 4.980843 1.06611641 1.07349298 -0.006895275 5.086346e-05
10 234 196 7 203 3.448276 0.42128794 0.28901734 0.376829847 4.984351e-02
11 235 200 8 208 3.846154 0.42988565 0.33030553 0.263501162 2.623948e-02
12 236 68 7 75 9.333333 0.14616112 0.28901734 -0.681777107 9.739610e-02
13 237 65 1 66 1.515152 0.13971284 0.04128819 1.219012607 1.199809e-01
14 NA 21469 566 22035 2.568641 46.14607514 23.36911643 0.680396572 1.549736e+01
Index Bin
1 2.845766 13
2 4.947752 14
3 2.515160 12
4 1.665250 10
5 1.421915 8
6 1.442188 9
7 1.273113 7
8 1.020359 5
9 1.006919 6
10 1.457656 3
11 1.301479 4
12 1.977389 11
13 3.383845 1
14 1.974661 2
I need to change row order so that column "Bin" is in a "right" order 1,2,3...14. ALso there is one more thing, I have data so that Bin is repeated
> OUTSTAND_AMOUNT_MRTG1
range X0 X1 total BR GDis BDis WOE
1 (0,8.88e+05] 68 2 70 2.857143 0.1463656 0.08288438 0.56866111
10 <NA> 45887 2339 48226 4.850081 98.7688069 96.93327808 0.01875895
2 (8.88e+05,1.36e+06] 66 6 72 8.333333 0.1420607 0.24865313 -0.55980414
4 (1.81e+06,2.26e+06] 65 7 72 9.722222 0.1399083 0.29009532 -0.72922230
8 (4.89e+06,7.96e+06] 65 7 72 9.722222 0.1399083 0.29009532 -0.72922230
7 (3.61e+06,4.89e+06] 64 8 72 11.111111 0.1377559 0.33153751 -0.87825787
3 (1.36e+06,1.81e+06] 62 10 72 13.888889 0.1334510 0.41442188 -1.13315012
5 (2.26e+06,2.77e+06] 62 10 72 13.888889 0.1334510 0.41442188 -1.13315012
6 (2.77e+06,3.61e+06] 61 11 72 15.277778 0.1312986 0.45586407 -1.24472083
9 (7.96e+06,9.31e+08] 59 13 72 18.055556 0.1269937 0.53874845 -1.44511133
IV Index Bin
1 0.03609931 1.765901 1
10 0.03443259 1.018936 2
2 0.05967086 1.750330 3
4 0.10951972 2.073467 4
8 0.10951972 2.073467 4
7 0.17019025 2.406703 6
3 0.31838219 3.105424 7
5 0.31838219 3.105424 7
6 0.40399344 3.471965 9
9 0.59503146 4.242324 10
Basically, Bin is a rank(BR),ties.method "min", because otherwise it shows mean, and because of that 5 and 8 are missing. How can I avoid it?
Thank you in advance.
Upvotes: 0
Views: 76
Reputation: 887851
You can try order
SALARY1 <- SALARY[order(SALARY$Bin),]
row.names(SALARY1) <- NULL
head(SALARY1)
# salary X0 X1 total BR GDis BDis WOE
#1 237 65 1 66 1.515152 0.1397128 0.04128819 1.219012607
#2 NA 21469 566 22035 2.568641 46.1460751 23.36911643 0.680396572
#3 234 196 7 203 3.448276 0.4212879 0.28901734 0.376829847
#4 235 200 8 208 3.846154 0.4298857 0.33030553 0.263501162
#5 232 1372 70 1442 4.854369 2.9490156 2.89017341 0.020154903
#6 233 496 26 522 4.980843 1.0661164 1.07349298 -0.006895275
# IV Index Bin
#1 1.199809e-01 3.383845 1
#2 1.549736e+01 1.974661 2
#3 4.984351e-02 1.457656 3
#4 2.623948e-02 1.301479 4
#5 1.185958e-03 1.020359 5
#6 5.086346e-05 1.006919 6
Regarding the new question, suppose if your Bin
is
Bin <- c(1,2,3,4,4,6,7,7,9,10)
cumsum(c(TRUE,diff(Bin)>0))
#[1] 1 2 3 4 4 5 6 6 7 8
For, your dataset OUTSTAND_AMOUNT_MRTG1
, it would be
OUTSTAND_AMOUNT_MRTG1 <- cumsum(c(TRUE,diff(OUTSTAND_AMOUNT_MRTG1$Bin)>0))
Upvotes: 1