Reputation: 7181
I get a totally unexpected result from ifelse()
and would be grateful for an explanation why. See reproducible data at bottom.
split_ratio = 0.8
target_label = "DV"
training.index <- caret::createDataPartition(dataset[[target_label]], p = split_ratio, list = FALSE)
training.set <- dataset[training.index, ]
The following works as expected:
testing.set <- if (split_ratio != 1.0) dataset[-training.index, ] else NULL
testing.set
# A tibble: 2 x 6
DV nn ee oo aa cc
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 89 87 135 112 118 139
2 80 82 134 111 136 128
The supposedly same code, wrapped in ifelse, returns a list containing a vector
testing.set <- ifelse(split_ratio != 1.0, dataset[-training.index, ], NULL)
testing.set
[[1]]
[1] 89 80
Is subsetting a dataframe/tibble not allowed in ifelse, or is there another reason? It is not really understandable from the help page...
Here's the reproducible content for variable dataset
above:
structure(list(DV = c(98, 89, 93, 80, 80, 65, 92, 85, 80, 90,
77, 80, 80, 75, 90, 88, 88, 90, 90, 88), nn = c(65, 87, 61, 82,
67, 75, 79, 56, 82, 80, 63, 77, 68, 82, 82, 87, 83, 73, 60, 60
), ee = c(149, 135, 149, 134, 153, 143, 129, 167, 168, 121, 138,
136, 129, 141, 116, 135, 142, 122, 134, 145), oo = c(118, 112,
109, 111, 79, 118, 101, 107, 134, 112, 125, 120, 108, 125, 110,
116, 94, 93, 108, 104), aa = c(129, 118, 140, 136, 99, 123, 119,
122, 122, 124, 89, 123, 120, 162, 116, 126, 140, 122, 129, 123
), cc = c(162, 139, 155, 128, 137, 126, 120, 154, 155, 143, 137,
137, 136, 138, 99, 119, 135, 138, 145, 147)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -20L), na.action = structure(c(`3` = 3L,
`11` = 11L, `15` = 15L, `17` = 17L, `19` = 19L, `20` = 20L, `29` = 29L,
`40` = 40L, `48` = 48L, `52` = 52L, `70` = 70L, `77` = 77L, `88` = 88L,
`119` = 119L, `124` = 124L, `152` = 152L, `163` = 163L, `169` = 169L,
`182` = 182L, `192` = 192L, `219` = 219L, `225` = 225L, `242` = 242L,
`244` = 244L, `247` = 247L, `253` = 253L, `265` = 265L, `267` = 267L,
`274` = 274L, `309` = 309L, `317` = 317L, `324` = 324L, `330` = 330L,
`341` = 341L, `364` = 364L, `366` = 366L, `386` = 386L, `411` = 411L,
`421` = 421L, `426` = 426L, `430` = 430L, `437` = 437L, `440` = 440L,
`450` = 450L, `454` = 454L, `460` = 460L, `462` = 462L, `476` = 476L,
`483` = 483L, `505` = 505L, `506` = 506L, `515` = 515L, `533` = 533L,
`535` = 535L, `540` = 540L, `552` = 552L, `563` = 563L, `578` = 578L,
`584` = 584L, `589` = 589L, `594` = 594L, `596` = 596L, `597` = 597L,
`604` = 604L, `609` = 609L, `614` = 614L, `671` = 671L, `683` = 683L,
`688` = 688L, `701` = 701L, `702` = 702L, `713` = 713L, `715` = 715L,
`719` = 719L, `752` = 752L, `773` = 773L, `793` = 793L, `794` = 794L,
`795` = 795L, `799` = 799L, `800` = 800L, `817` = 817L, `823` = 823L,
`829` = 829L, `834` = 834L, `849` = 849L, `850` = 850L, `851` = 851L,
`854` = 854L, `875` = 875L, `882` = 882L, `891` = 891L, `892` = 892L,
`895` = 895L, `910` = 910L, `924` = 924L, `925` = 925L, `936` = 936L,
`955` = 955L, `958` = 958L, `968` = 968L, `972` = 972L, `984` = 984L,
`989` = 989L, `991` = 991L, `992` = 992L, `994` = 994L, `1007` = 1007L,
`1018` = 1018L, `1029` = 1029L, `1030` = 1030L, `1049` = 1049L,
`1065` = 1065L, `1084` = 1084L, `1085` = 1085L, `1086` = 1086L,
`1095` = 1095L, `1096` = 1096L, `1097` = 1097L, `1100` = 1100L,
`1102` = 1102L, `1110` = 1110L, `1117` = 1117L, `1125` = 1125L,
`1127` = 1127L, `1145` = 1145L, `1160` = 1160L, `1161` = 1161L,
`1164` = 1164L, `1166` = 1166L, `1171` = 1171L, `1187` = 1187L,
`1191` = 1191L, `1194` = 1194L, `1212` = 1212L, `1215` = 1215L,
`1239` = 1239L, `1254` = 1254L, `1262` = 1262L, `1263` = 1263L,
`1274` = 1274L, `1297` = 1297L, `1308` = 1308L, `1325` = 1325L,
`1328` = 1328L, `1331` = 1331L, `1337` = 1337L, `1338` = 1338L,
`1340` = 1340L, `1342` = 1342L, `1348` = 1348L, `1354` = 1354L,
`1361` = 1361L, `1367` = 1367L, `1373` = 1373L, `1379` = 1379L,
`1389` = 1389L, `1406` = 1406L, `1411` = 1411L, `1422` = 1422L,
`1423` = 1423L, `1436` = 1436L, `1439` = 1439L, `1441` = 1441L,
`1446` = 1446L, `1449` = 1449L, `1476` = 1476L, `1480` = 1480L,
`1481` = 1481L, `1483` = 1483L, `1503` = 1503L, `1511` = 1511L,
`1516` = 1516L, `1521` = 1521L, `1524` = 1524L, `1527` = 1527L,
`1544` = 1544L, `1550` = 1550L, `1567` = 1567L, `1580` = 1580L,
`1582` = 1582L, `1586` = 1586L, `1595` = 1595L, `1601` = 1601L,
`1609` = 1609L, `1612` = 1612L, `1619` = 1619L), class = "omit"))
Upvotes: 3
Views: 205
Reputation: 389225
In general the rule should be if you have only one condition to test use if
/else
instead of ifelse
. The issue here is not specific to any dataset and can be easily reproduced. Consider mtcars
df <- mtcars
num <- 1
index <- 1:4
if(num == 1) df[index, ] else NULL
# mpg cyl disp hp drat wt qsec vs am gear carb
#Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
but if we use ifelse
ifelse(num == 1, df[index, ], NULL)
#[[1]]
#[1] 21.0 21.0 22.8 21.4
The reason is there in ?ifelse
Under Value
A vector of the same length and attributes (including dimensions and "class") as test
So if your test
(length(num == 1)
) is of size 1, it will return output of same size (1 column) and it will loose it's dimensions. If you change num
to
num <- c(1, 1)
ifelse(num == 1, df[index, ], NULL)
#[[1]]
#[1] 21.0 21.0 22.8 21.4
#[[2]]
#[1] 6 6 4 6
it will return you 2 columns now.
Upvotes: 2
Reputation: 17299
ifelse
is a vectorized function. see ?ifelse
:
ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE.
since split_ratio != 1.0
is a vector whose first value is TRUE
, therefore the first value of return value is taken from dataset[-training.index, ]
, that is dataset[-training.index, ][1]
. Therefore, you got a list of length one.
ifelse and if{}else{}
are not equal. The document explicitly recommends if{} else{}
for you case:
Further note that if(test) yes else no is much more efficient and often much preferable to ifelse(test, yes, no) whenever test is a simple true/false result, i.e., when length(test) == 1.
Upvotes: 3