Agile Bean
Agile Bean

Reputation: 7181

ifelse does not perform subset on dataframe

I get a totally unexpected result from ifelse() and would be grateful for an explanation why. See reproducible data at bottom.

split_ratio = 0.8
target_label = "DV"
training.index <- caret::createDataPartition(dataset[[target_label]], p = split_ratio, list = FALSE)
training.set <- dataset[training.index, ]

The following works as expected:

testing.set <- if (split_ratio != 1.0) dataset[-training.index, ] else NULL
testing.set
# A tibble: 2 x 6
     DV    nn    ee    oo    aa    cc
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1    89    87   135   112   118   139
2    80    82   134   111   136   128

The supposedly same code, wrapped in ifelse, returns a list containing a vector

testing.set <- ifelse(split_ratio != 1.0, dataset[-training.index, ], NULL)
testing.set
[[1]]
[1] 89 80

Is subsetting a dataframe/tibble not allowed in ifelse, or is there another reason? It is not really understandable from the help page...

Here's the reproducible content for variable dataset above:

structure(list(DV = c(98, 89, 93, 80, 80, 65, 92, 85, 80, 90, 
77, 80, 80, 75, 90, 88, 88, 90, 90, 88), nn = c(65, 87, 61, 82, 
67, 75, 79, 56, 82, 80, 63, 77, 68, 82, 82, 87, 83, 73, 60, 60
), ee = c(149, 135, 149, 134, 153, 143, 129, 167, 168, 121, 138, 
136, 129, 141, 116, 135, 142, 122, 134, 145), oo = c(118, 112, 
109, 111, 79, 118, 101, 107, 134, 112, 125, 120, 108, 125, 110, 
116, 94, 93, 108, 104), aa = c(129, 118, 140, 136, 99, 123, 119, 
122, 122, 124, 89, 123, 120, 162, 116, 126, 140, 122, 129, 123
), cc = c(162, 139, 155, 128, 137, 126, 120, 154, 155, 143, 137, 
137, 136, 138, 99, 119, 135, 138, 145, 147)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -20L), na.action = structure(c(`3` = 3L, 
`11` = 11L, `15` = 15L, `17` = 17L, `19` = 19L, `20` = 20L, `29` = 29L, 
`40` = 40L, `48` = 48L, `52` = 52L, `70` = 70L, `77` = 77L, `88` = 88L, 
`119` = 119L, `124` = 124L, `152` = 152L, `163` = 163L, `169` = 169L, 
`182` = 182L, `192` = 192L, `219` = 219L, `225` = 225L, `242` = 242L, 
`244` = 244L, `247` = 247L, `253` = 253L, `265` = 265L, `267` = 267L, 
`274` = 274L, `309` = 309L, `317` = 317L, `324` = 324L, `330` = 330L, 
`341` = 341L, `364` = 364L, `366` = 366L, `386` = 386L, `411` = 411L, 
`421` = 421L, `426` = 426L, `430` = 430L, `437` = 437L, `440` = 440L, 
`450` = 450L, `454` = 454L, `460` = 460L, `462` = 462L, `476` = 476L, 
`483` = 483L, `505` = 505L, `506` = 506L, `515` = 515L, `533` = 533L, 
`535` = 535L, `540` = 540L, `552` = 552L, `563` = 563L, `578` = 578L, 
`584` = 584L, `589` = 589L, `594` = 594L, `596` = 596L, `597` = 597L, 
`604` = 604L, `609` = 609L, `614` = 614L, `671` = 671L, `683` = 683L, 
`688` = 688L, `701` = 701L, `702` = 702L, `713` = 713L, `715` = 715L, 
`719` = 719L, `752` = 752L, `773` = 773L, `793` = 793L, `794` = 794L, 
`795` = 795L, `799` = 799L, `800` = 800L, `817` = 817L, `823` = 823L, 
`829` = 829L, `834` = 834L, `849` = 849L, `850` = 850L, `851` = 851L, 
`854` = 854L, `875` = 875L, `882` = 882L, `891` = 891L, `892` = 892L, 
`895` = 895L, `910` = 910L, `924` = 924L, `925` = 925L, `936` = 936L, 
`955` = 955L, `958` = 958L, `968` = 968L, `972` = 972L, `984` = 984L, 
`989` = 989L, `991` = 991L, `992` = 992L, `994` = 994L, `1007` = 1007L, 
`1018` = 1018L, `1029` = 1029L, `1030` = 1030L, `1049` = 1049L, 
`1065` = 1065L, `1084` = 1084L, `1085` = 1085L, `1086` = 1086L, 
`1095` = 1095L, `1096` = 1096L, `1097` = 1097L, `1100` = 1100L, 
`1102` = 1102L, `1110` = 1110L, `1117` = 1117L, `1125` = 1125L, 
`1127` = 1127L, `1145` = 1145L, `1160` = 1160L, `1161` = 1161L, 
`1164` = 1164L, `1166` = 1166L, `1171` = 1171L, `1187` = 1187L, 
`1191` = 1191L, `1194` = 1194L, `1212` = 1212L, `1215` = 1215L, 
`1239` = 1239L, `1254` = 1254L, `1262` = 1262L, `1263` = 1263L, 
`1274` = 1274L, `1297` = 1297L, `1308` = 1308L, `1325` = 1325L, 
`1328` = 1328L, `1331` = 1331L, `1337` = 1337L, `1338` = 1338L, 
`1340` = 1340L, `1342` = 1342L, `1348` = 1348L, `1354` = 1354L, 
`1361` = 1361L, `1367` = 1367L, `1373` = 1373L, `1379` = 1379L, 
`1389` = 1389L, `1406` = 1406L, `1411` = 1411L, `1422` = 1422L, 
`1423` = 1423L, `1436` = 1436L, `1439` = 1439L, `1441` = 1441L, 
`1446` = 1446L, `1449` = 1449L, `1476` = 1476L, `1480` = 1480L, 
`1481` = 1481L, `1483` = 1483L, `1503` = 1503L, `1511` = 1511L, 
`1516` = 1516L, `1521` = 1521L, `1524` = 1524L, `1527` = 1527L, 
`1544` = 1544L, `1550` = 1550L, `1567` = 1567L, `1580` = 1580L, 
`1582` = 1582L, `1586` = 1586L, `1595` = 1595L, `1601` = 1601L, 
`1609` = 1609L, `1612` = 1612L, `1619` = 1619L), class = "omit"))

Upvotes: 3

Views: 205

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 389225

In general the rule should be if you have only one condition to test use if/else instead of ifelse. The issue here is not specific to any dataset and can be easily reproduced. Consider mtcars

df <- mtcars
num <- 1
index <- 1:4
if(num == 1) df[index, ] else NULL

#                mpg cyl disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4      21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag  21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710     22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive 21.4   6  258 110 3.08 3.215 19.44  1  0    3    1

but if we use ifelse

ifelse(num == 1, df[index, ], NULL)
#[[1]]
#[1] 21.0 21.0 22.8 21.4

The reason is there in ?ifelse Under Value

A vector of the same length and attributes (including dimensions and "class") as test

So if your test (length(num == 1)) is of size 1, it will return output of same size (1 column) and it will loose it's dimensions. If you change num to

num <- c(1, 1)
ifelse(num == 1, df[index, ], NULL)

#[[1]]
#[1] 21.0 21.0 22.8 21.4

#[[2]]
#[1] 6 6 4 6

it will return you 2 columns now.

Upvotes: 2

mt1022
mt1022

Reputation: 17299

ifelse is a vectorized function. see ?ifelse:

ifelse returns a value with the same shape as test which is filled with elements selected from either yes or no depending on whether the element of test is TRUE or FALSE.

since split_ratio != 1.0 is a vector whose first value is TRUE, therefore the first value of return value is taken from dataset[-training.index, ], that is dataset[-training.index, ][1]. Therefore, you got a list of length one.

ifelse and if{}else{} are not equal. The document explicitly recommends if{} else{} for you case:

Further note that if(test) yes else no is much more efficient and often much preferable to ifelse(test, yes, no) whenever test is a simple true/false result, i.e., when length(test) == 1.

Upvotes: 3

Related Questions