kompprograms
kompprograms

Reputation: 11

Getting a "NA/NaN/Inf in foreign function call" error when attempting to implement KNN using the class library

I tried to implement knn using the class package with the following code:

KNN_build <- knn(train = who_training, test = who_validation, 
                 cl = who_training$Status, k = 5  )

The errors I receive are the following:

Warning in knn(train = who_training, test = who_validation, cl = who_training$Status,  :
  NAs introduced by coercion
Warning in knn(train = who_training, test = who_validation, cl = who_training$Status,  :
  NAs introduced by coercion
Error in knn(train = who_training, test = who_validation, cl = who_training$Status,  : 
  NA/NaN/Inf in foreign function call (arg 6)

Here's the structure of the datasets I'm working with after using: str(who_training)

 $ Country                        : chr  "Kiribati" "Afghanistan" "El Salvador" "Guyana" ...
 $ Year                           : num [1:2938, 1] -1.196 -0.546 1.405 -0.979 -0.979 ...
 $ Status                         : Factor w/ 2 levels "Developed","Developing": 2 2 2 2 2 2 2 2 2 2 ...
 $ Life.expectancy                : num [1:2938, 1] -0.492 -1.262 0.427 -0.418 -1.104 ...
 $ Adult.Mortality                : num [1:2938, 1] 0.526 1.188 0.204 0.705 1.654 ...
 $ infant.deaths                  : num [1:2938, 1] -0.439 1.498 -0.393 -0.416 -0.279 ...
 $ Alcohol                        : num [1:2938, 1] -1.036 -1.157 -0.516 0.871 -1.019 ...
 $ percentage.expenditure         : num [1:2938, 1] -0.368 -0.435 0.208 -0.421 -0.43 ...
 $ Hepatitis.B                    : num [1:2938, 1] -0.206 -0.793 0.426 -3.366 0.336 ...
 $ Measles                        : num [1:2938, 1] -0.3043 0.0316 -0.3043 -0.3043 -0.2068 ...
 $ BMI                            : num [1:2938, 1] 1.571 -1.213 0.8537 -0.0292 -1.2581 ...
 $ under.five.deaths              : num [1:2938, 1] -0.447 1.493 -0.414 -0.43 -0.282 ...
 $ Polio                          : num [1:2938, 1] 0.597 -2.11 0.383 0.241 0.526 ...
 $ Total.expenditure              : num [1:2938, 1] 1.4574 1.2784 0.4148 0.0569 -1.0483 ...
 $ Diphtheria                     : num [1:2938, 1] -0.703 -1.994 0.452 0.385 0.385 ...
 $ HIV.AIDS                       : num [1:2938, 1] -0.406 -0.406 -0.364 0.442 0.357 ...
 $ GDP                            : num [1:2938, 1] -0.465 -0.552 -0.12 -0.448 -0.53 ...
 $ Population                     : num [1:2938, 1] -0.3631 -0.3547 -0.0603 -0.3306 -0.1846 ...
 $ thinness..1.19.years           : num [1:2938, 1] -1.153 -0.319 -0.777 0.353 1.401 ...
 $ thinness.5.9.years             : num [1:2938, 1] -1.148 -0.318 -0.773 0.298 1.396 ...
 $ Income.composition.of.resources: num [1:2938, 1] -3.0728 -1.1425 0.2225 -0.0944 -3.0728 ...
 $ Schooling                      : num [1:2938, 1] -0.14 -1.388 0.352 -0.337 -2.439 ...
 $ disease                        : num  252 1478 280 111 657 ...

And the other dataset,

str(who_validation)
'data.frame':   439 obs. of  23 variables:
 $ Country                        : chr  "Niger" "Estonia" "Saudi Arabia" "Saudi Arabia" ...
 $ Year                           : num [1:439, 1] -0.546 0.104 1.621 0.971 -0.112 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ Status                         : Factor w/ 2 levels "Developed","Developing": 2 2 2 2 2 2 2 2 2 2 ...
 $ Life.expectancy                : num [1:439, 1] -1.642 0.522 0.553 0.511 0.933 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ Adult.Mortality                : num [1:439, 1] 1.0719 0.0785 -0.6285 -1.3355 -1.2908 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ infant.deaths                  : num [1:439, 1] 0.815 -0.439 -0.279 -0.256 -0.416 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ Alcohol                        : num [1:439, 1] -1.134 -0.199 -0.2 -1.139 -0.103 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ percentage.expenditure         : num [1:439, 1] -0.433 -0.219 -0.437 -0.246 0.29 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ Hepatitis.B                    : num [1:439, 1] 0.381 0.471 0.652 0.652 0.426 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ Measles                        : num [1:439, 1] 0.262 -0.304 -0.248 -0.228 -0.304 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ BMI                            : num [1:439, 1] -1.153 0.919 1.496 1.365 0.854 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ under.five.deaths              : num [1:439, 1] 1.427 -0.447 -0.315 -0.299 -0.43 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ Polio                          : num [1:439, 1] -2.965 0.526 0.668 0.739 0.811 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ Total.expenditure              : num [1:439, 1] 0.5849 0.3388 -0.0404 -0.8872 -1.979 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ Diphtheria                     : num [1:439, 1] -2.878 0.52 0.724 0.724 0.385 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ HIV.AIDS                       : num [1:439, 1] 0.23 -0.406 -0.406 -0.406 -0.406 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ GDP                            : num [1:439, 1] -0.5524 -0.3486 -0.2572 -0.2789 0.0101 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ Population                     : num [1:439, 1] 0.298 -0.367 -0.3 -0.3 -0.3 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ thinness..1.19.years           : num [1:439, 1] 1.993 -0.669 0.89 0.81 -0.293 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ thinness.5.9.years             : num [1:439, 1] 1.959 -0.639 0.834 0.78 -0.344 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ Income.composition.of.resources: num [1:439, 1] -1.718 0.998 1.046 0.915 0.603 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ Schooling                      : num [1:439, 1] -2.833 1.305 1.305 0.779 1.272 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : NULL
 $ disease                        : num  297 284 512 588 285 ...

I can provide further details if needed, like the dput for my training dataframe

dput(head(who_training))
structure(list(Country = structure(c(89L, 1L, 54L, 72L, 56L, 
193L), .Label = c("Afghanistan", "Albania", "Algeria", "Angola", 
"Antigua and Barbuda", "Argentina", "Armenia", "Australia", "Austria", 
"Azerbaijan", "Bahamas", "Bahrain", "Bangladesh", "Barbados", 
"Belarus", "Belgium", "Belize", "Benin", "Bhutan", "Bolivia (Plurinational State of)", 
"Bosnia and Herzegovina", "Botswana", "Brazil", "Brunei Darussalam", 
"Bulgaria", "Burkina Faso", "Burundi", "Cabo Verde", "Cambodia", 
"Cameroon", "Canada", "Central African Republic", "Chad", "Chile", 
"China", "Colombia", "Comoros", "Congo", "Cook Islands", "Costa Rica", 
"Côte d'Ivoire", "Croatia", "Cuba", "Cyprus", "Czechia", "Democratic People's Republic of Korea", 
"Democratic Republic of the Congo", "Denmark", "Djibouti", "Dominica", 
"Dominican Republic", "Ecuador", "Egypt", "El Salvador", "Equatorial Guinea", 
"Eritrea", "Estonia", "Ethiopia", "Fiji", "Finland", "France", 
"Gabon", "Gambia", "Georgia", "Germany", "Ghana", "Greece", "Grenada", 
"Guatemala", "Guinea", "Guinea-Bissau", "Guyana", "Haiti", "Honduras", 
"Hungary", "Iceland", "India", "Indonesia", "Iran (Islamic Republic of)", 
"Iraq", "Ireland", "Israel", "Italy", "Jamaica", "Japan", "Jordan", 
"Kazakhstan", "Kenya", "Kiribati", "Kuwait", "Kyrgyzstan", "Lao People's Democratic Republic", 
"Latvia", "Lebanon", "Lesotho", "Liberia", "Libya", "Lithuania", 
"Luxembourg", "Madagascar", "Malawi", "Malaysia", "Maldives", 
"Mali", "Malta", "Marshall Islands", "Mauritania", "Mauritius", 
"Mexico", "Micronesia (Federated States of)", "Monaco", "Mongolia", 
"Montenegro", "Morocco", "Mozambique", "Myanmar", "Namibia", 
"Nauru", "Nepal", "Netherlands", "New Zealand", "Nicaragua", 
"Niger", "Nigeria", "Niue", "Norway", "Oman", "Pakistan", "Palau", 
"Panama", "Papua New Guinea", "Paraguay", "Peru", "Philippines", 
"Poland", "Portugal", "Qatar", "Republic of Korea", "Republic of Moldova", 
"Romania", "Russian Federation", "Rwanda", "Saint Kitts and Nevis", 
"Saint Lucia", "Saint Vincent and the Grenadines", "Samoa", "San Marino", 
"Sao Tome and Principe", "Saudi Arabia", "Senegal", "Serbia", 
"Seychelles", "Sierra Leone", "Singapore", "Slovakia", "Slovenia", 
"Solomon Islands", "Somalia", "South Africa", "South Sudan", 
"Spain", "Sri Lanka", "Sudan", "Suriname", "Swaziland", "Sweden", 
"Switzerland", "Syrian Arab Republic", "Tajikistan", "Thailand", 
"The former Yugoslav republic of Macedonia", "Timor-Leste", "Togo", 
"Tonga", "Trinidad and Tobago", "Tunisia", "Turkey", "Turkmenistan", 
"Tuvalu", "Uganda", "Ukraine", "United Arab Emirates", "United Kingdom of Great Britain and Northern Ireland", 
"United Republic of Tanzania", "United States of America", "Uruguay", 
"Uzbekistan", "Vanuatu", "Venezuela (Bolivarian Republic of)", 
"Viet Nam", "Yemen", "Zambia", "Zimbabwe"), class = "factor"), 
    Year = structure(c(-1.19612277260829, -0.545905298957761, 
    1.40474712199383, -0.97938361472478, -0.97938361472478, 0.104312174692768
    ), .Dim = c(6L, 1L)), Status = structure(c(2L, 2L, 2L, 2L, 
    2L, 2L), .Label = c("Developed", "Developing"), class = "factor"), 
    Life.expectancy = structure(c(-0.491703688946568, -1.26227192428072, 
    0.426644755903723, -0.417813584188498, -1.10393598551343, 
    -2.22284328613562), .Dim = c(6L, 1L)), Adult.Mortality = structure(c(0.525991300297776, 
    1.18826519708731, 0.203803999156919, 0.704984245376029, 1.65364685429077, 
    -0.12733294923785), .Dim = c(6L, 1L)), infant.deaths = structure(c(-0.438727256213326, 
    1.49835585218564, -0.39314883013335, -0.415938043173338, 
    -0.279202764933411, 0.244949134986309), .Dim = c(6L, 1L)), 
    Alcohol = structure(c(-1.03648468049828, -1.15698267513806, 
    -0.516035895139232, 0.870972936778229, -1.01853817065831, 
    -0.24940203465972), .Dim = c(6L, 1L)), percentage.expenditure = structure(c(-0.368092282523513, 
    -0.435250731904734, 0.207934359477078, -0.420640377004664, 
    -0.429901306980893, -0.416415384868822), .Dim = c(6L, 1L)), 
    Hepatitis.B = structure(c(-0.20594618442662, -0.792822896206842, 
    0.426074889798235, -3.3660515555509, 0.33578616490897, -0.38652363420515
    ), .Dim = c(6L, 1L)), Measles = structure(c(-0.304292225859768, 
    0.0316262810969904, -0.304292225859768, -0.304292225859768, 
    -0.206834387421696, -0.304292225859768), .Dim = c(6L, 1L)), 
    BMI = structure(c(1.57101747462593, -1.21297832599103, 0.853699637710209, 
    -0.029153084647603, -1.25812420383887, -0.49064428042555), .Dim = c(6L, 
    1L)), under.five.deaths = structure(c(-0.446748080417218, 
    1.49318542883105, -0.413867851446909, -0.430307965932063, 
    -0.28234693556567, 0.309497185899902), .Dim = c(6L, 1L)), 
    Polio = structure(c(0.596897326350385, -2.10995531198027, 
    0.383198433850596, 0.240732505517404, 0.525664362183789, 
    -0.898994921148134), .Dim = c(6L, 1L)), Total.expenditure = structure(c(1.45740571947119, 
    1.27842521470575, 0.414844279212532, 0.0568832696816624, 
    -1.0483213472449, -0.39504250485106), .Dim = c(6L, 1L)), 
    Diphtheria = structure(c(-0.702868620889384, -1.9941482705615, 
    0.452486855133033, 0.384524768308185, 0.384524768308185, 
    -0.83879279453908), .Dim = c(6L, 1L)), HIV.AIDS = structure(c(-0.405940295110872, 
    -0.405940295110872, -0.363540899930249, 0.442047608501586, 
    0.35724881814034, -0.405940295110872), .Dim = c(6L, 1L)), 
    GDP = structure(c(-0.465047724667165, -0.552420239231645, 
    -0.12024411417243, -0.447578456170917, -0.529794471763561, 
    -0.519666432315853), .Dim = c(6L, 1L)), Population = structure(c(-0.363065262988643, 
    -0.354733142686027, -0.0602998996500711, -0.330582727476206, 
    -0.184602200482603, 0.295425496705686), .Dim = c(6L, 1L)), 
    thinness..1.19.years = structure(c(-1.15296800498538, -0.319469152262016, 
    -0.776549168271602, 0.352707341869728, 1.40130267271525, 
    0.890448537175124), .Dim = c(6L, 1L)), thinness.5.9.years = structure(c(-1.14791740179065, 
    -0.317661129245177, -0.772962956124954, 0.298335460062757, 
    1.39641633665516, 0.887549588965998), .Dim = c(6L, 1L)), 
    Income.composition.of.resources = structure(c(-3.07284300635863, 
    -1.14245031797641, 0.222473805122128, -0.0943835805971753, 
    -3.07284300635863, -1.02058209269975), .Dim = c(6L, 1L)), 
    Schooling = structure(c(-0.140241143049229, -1.38811570864962, 
    0.352340922319346, -0.337273969196659, -2.43895744810258, 
    -0.797017230207329), .Dim = c(6L, 1L)), disease = c(252.1, 
    1478.1, 280.2, 111.1, 656.9, 245.5)), row.names = c(1392L, 
11L, 820L, 1119L, 863L, 2930L), class = "data.frame")

and the dput for my validation dataframe:

dput(head(who_validation))
structure(list(Country = structure(c(110L, 52L, 132L, 132L, 40L, 
60L), .Label = c("Afghanistan", "Albania", "Algeria", "Angola", 
"Antigua and Barbuda", "Argentina", "Armenia", "Australia", "Austria", 
"Azerbaijan", "Bahrain", "Bangladesh", "Barbados", "Belarus", 
"Belgium", "Belize", "Benin", "Bhutan", "Bolivia (Plurinational State of)", 
"Botswana", "Brazil", "Brunei Darussalam", "Bulgaria", "Burkina Faso", 
"Burundi", "Cabo Verde", "Cambodia", "Cameroon", "Canada", "Central African Republic", 
"Chad", "Chile", "China", "Colombia", "Comoros", "Congo", "Costa Rica", 
"Côte d'Ivoire", "Croatia", "Cuba", "Cyprus", "Czechia", "Democratic People's Republic of Korea", 
"Democratic Republic of the Congo", "Denmark", "Dominican Republic", 
"Ecuador", "Egypt", "El Salvador", "Equatorial Guinea", "Eritrea", 
"Estonia", "Ethiopia", "Finland", "Gabon", "Gambia", "Georgia", 
"Germany", "Ghana", "Greece", "Grenada", "Guatemala", "Guinea", 
"Guinea-Bissau", "Haiti", "Honduras", "Hungary", "Iceland", "India", 
"Indonesia", "Iran (Islamic Republic of)", "Iraq", "Ireland", 
"Israel", "Italy", "Jamaica", "Japan", "Jordan", "Kazakhstan", 
"Kenya", "Kiribati", "Kuwait", "Kyrgyzstan", "Lao People's Democratic Republic", 
"Latvia", "Lesotho", "Liberia", "Libya", "Lithuania", "Luxembourg", 
"Malawi", "Malaysia", "Maldives", "Mali", "Malta", "Mauritania", 
"Mauritius", "Mexico", "Micronesia (Federated States of)", "Mongolia", 
"Montenegro", "Morocco", "Mozambique", "Myanmar", "Namibia", 
"Nepal", "Netherlands", "New Zealand", "Nicaragua", "Niger", 
"Nigeria", "Norway", "Oman", "Pakistan", "Palau", "Panama", "Papua New Guinea", 
"Paraguay", "Peru", "Philippines", "Poland", "Portugal", "Qatar", 
"Republic of Moldova", "Romania", "Russian Federation", "Rwanda", 
"Saint Lucia", "Saint Vincent and the Grenadines", "Samoa", "Sao Tome and Principe", 
"Saudi Arabia", "Senegal", "Serbia", "Seychelles", "Sierra Leone", 
"Singapore", "Slovakia", "Slovenia", "Solomon Islands", "Somalia", 
"South Africa", "South Sudan", "Spain", "Sri Lanka", "Sudan", 
"Suriname", "Swaziland", "Sweden", "Switzerland", "Syrian Arab Republic", 
"Tajikistan", "Thailand", "The former Yugoslav republic of Macedonia", 
"Timor-Leste", "Tonga", "Trinidad and Tobago", "Tunisia", "Turkey", 
"Turkmenistan", "Uganda", "Ukraine", "United Arab Emirates", 
"United Kingdom of Great Britain and Northern Ireland", "United Republic of Tanzania", 
"United States of America", "Uruguay", "Uzbekistan", "Vanuatu", 
"Venezuela (Bolivarian Republic of)", "Yemen", "Zambia", "Zimbabwe"
), class = "factor"), Year = structure(c(-0.545905298957761, 
0.104312174692768, 1.62148627987734, 0.971268806226806, -0.112426983190742, 
-0.329166141074251), .Dim = c(6L, 1L), .Dimnames = list(NULL, 
    NULL)), Status = structure(c(2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Developed", 
"Developing"), class = "factor"), Life.expectancy = structure(c(-1.64227817732222, 
0.521646319164099, 0.553313506917557, 0.511090589912945, 0.933319759959055, 
1.1022114279775), .Dim = c(6L, 1L), .Dimnames = list(NULL, NULL)), 
    Adult.Mortality = structure(c(1.07191978278645, 0.0785089376021415, 
    -0.62851319545696, -1.33553532851606, -1.2907870922465, -0.72695931525
    ), .Dim = c(6L, 1L), .Dimnames = list(NULL, NULL)), infant.deaths = structure(c(0.814679460986005, 
    -0.438727256213326, -0.279202764933411, -0.256413551893423, 
    -0.415938043173338, -0.438727256213326), .Dim = c(6L, 1L), .Dimnames = list(
        NULL, NULL)), Alcohol = structure(c(-1.1339085910581, 
    -0.199408185819812, -0.200049132599811, -1.13903616529809, 
    -0.103266168819988, 1.25297721765753), .Dim = c(6L, 1L), .Dimnames = list(
        NULL, NULL)), percentage.expenditure = structure(c(-0.432962903502602, 
    -0.218689034593455, -0.43659516554427, -0.246467036861867, 
    0.290217084455785, -0.128158007506263), .Dim = c(6L, 1L), .Dimnames = list(
        NULL, NULL)), Hepatitis.B = structure(c(0.380930527353603, 
    0.471219252242868, 0.651796702021398, 0.651796702021398, 
    0.426074889798235, 0.471219252242868), .Dim = c(6L, 1L), .Dimnames = list(
        NULL, NULL)), Measles = structure(c(0.261533469114463, 
    -0.304292225859768, -0.247528218897167, -0.228088490485318, 
    -0.304292225859768, -0.304292225859768), .Dim = c(6L, 1L), .Dimnames = list(
        NULL, NULL)), BMI = structure(c(-1.15278382219391, 0.918910350157092, 
    1.49577434487953, 1.36535291998576, 0.853699637710209, 1.14463973939631
    ), .Dim = c(6L, 1L), .Dimnames = list(NULL, NULL)), under.five.deaths = structure(c(1.42742497089043, 
    -0.446748080417218, -0.31522716453598, -0.298787050050825, 
    -0.430307965932063, -0.430307965932063), .Dim = c(6L, 1L), .Dimnames = list(
        NULL, NULL)), Polio = structure(c(-2.96475088197942, 
    0.525664362183789, 0.668130290516981, 0.739363254683577, 
    0.810596218850173, 0.739363254683577), .Dim = c(6L, 1L), .Dimnames = list(
        NULL, NULL)), Total.expenditure = structure(c(0.584875758739695, 
    0.338777564687222, -0.0404373797845428, -0.887238892956006, 
    -1.97901997202516, 1.56479402233045), .Dim = c(6L, 1L), .Dimnames = list(
        NULL, NULL)), Diphtheria = structure(c(-2.87765539928452, 
    0.520448941957881, 0.724335202432426, 0.724335202432426, 
    0.384524768308185, 0.724335202432426), .Dim = c(6L, 1L), .Dimnames = list(
        NULL, NULL)), HIV.AIDS = structure(c(0.230050632598472, 
    -0.405940295110872, -0.405940295110872, -0.405940295110872, 
    -0.405940295110872, -0.405940295110872), .Dim = c(6L, 1L), .Dimnames = list(
        NULL, NULL)), GDP = structure(c(-0.552410469161336, -0.348597444406105, 
    -0.257188085587563, -0.278877759218448, 0.0101370026695019, 
    -0.284633785448075), .Dim = c(6L, 1L), .Dimnames = list(NULL, 
        NULL)), Population = structure(c(0.298357417598302, -0.366680744764259, 
    -0.299596727877204, -0.299596727877204, -0.299596727877204, 
    -0.361842293185169), .Dim = c(6L, 1L), .Dimnames = list(NULL, 
        NULL)), thinness..1.19.years = structure(c(1.99281798755118, 
    -0.669000929210523, 0.890448537175124, 0.809787357879315, 
    -0.292582092496746, -0.99164564639376), .Dim = c(6L, 1L), .Dimnames = list(
        NULL, NULL)), thinness.5.9.years = structure(c(1.95884800515371, 
    -0.63905065410149, 0.833984668156613, 0.780419747347227, 
    -0.34444358964987, -1.01400509976719), .Dim = c(6L, 1L), .Dimnames = list(
        NULL, NULL)), Income.composition.of.resources = structure(c(-1.71766834128222, 
    0.997555717881655, 1.04630300799232, 0.91468532469353, 0.602702667985292, 
    1.07067665304765), .Dim = c(6L, 1L), .Dimnames = list(NULL, 
        NULL)), Schooling = structure(c(-2.83302310039744, 1.30466624869859, 
    1.30466624869859, 0.779245378972111, 1.27182744434069, 1.3703438574144
    ), .Dim = c(6L, 1L), .Dimnames = list(NULL, NULL)), disease = c(297.1, 
    284.1, 512.1, 588.1, 285.1, 290.1)), row.names = c(1888L, 
874L, 2234L, 2237L, 666L, 1036L), class = "data.frame")

I expected to train the KNN algorithm with my dataset so I could predict a specific set of data points.

Before processing the data I have imputed missing values through the following code:

who_validation$disease <- ifelse(is.na(who_validation$disease), median(who_validation$disease, na.rm = TRUE), who_validation$disease) 

who_training$disease <- ifelse(is.na(who_training$disease), median(who_training$disease, na.rm = TRUE), who_training$disease) 

I've tried turning relevant variables into factors in case they were interfering with the function through the following code:

who_training$Status <- as.factor(who_training$Status)
who_validation$Status <- as.factor(who_validation$Status)

who_training$Country <- as.factor(who_training$Country)
who_validation$Country <- as.factor(who_validation$Country)

This doesn't seem to work. I'm trying to figure out what part of my dataframe is causing this error. How can I troubleshoot the issue?

Upvotes: 1

Views: 50

Answers (2)

kompprograms
kompprograms

Reputation: 11

I found a solution. The KNN function doesn't handle text fields well. I also wasn't supposed to have my class in my training/validation dataset so I removed those fields, and only used numeric data for my KNN function. The class labels were taken as a variable from my dataset, and then removed from the original table.

Upvotes: 0

Klara
Klara

Reputation: 16

The error complains about NA, NaN and Inf values, not only NAs. Did you check for them with is.nan() and is.infinite()?

Also this answer suggest removing labels from testing and training data: Error: NA/NaN/Inf in foreign function call (arg 6) in knn function written in R.

If that does not help provide a minimal example so we can test it.

Upvotes: 0

Related Questions