jsguy
jsguy

Reputation: 2179

Why does R say that there are rows containing missing values?

I run the following R script on this dataset: http://pastebin.com/HA42b8QV

require(ggplot2)
data <- read.table("funcExp.txt", sep = "\t", header = TRUE)
data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$insTime <- strtoi(data$insTime)
ggplot(data, aes(n, insTime, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$decTime <- strtoi(data$decTime)
ggplot(data, aes(n, decTime, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$delTime <- strtoi(data$delTime)
ggplot(data, aes(n, delTime, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$insComp <- strtoi(data$insComp)
ggplot(data, aes(n, insComp, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")


data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$decComp <- strtoi(data$decComp)
ggplot(data, aes(n, decComp, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$delComp <- strtoi(data$delComp)
ggplot(data, aes(n, delComp, color = alg)) + 
  geom_point() +
  stat_summary(fun.y=median, geom="line")

and I get the following warnings:

Loading required package: ggplot2
Loading required package: methods
Warning messages:
1: Removed 26 rows containing missing values (stat_summary). 
2: Removed 26 rows containing missing values (geom_point). 
Warning messages:
1: Removed 30 rows containing missing values (stat_summary). 
2: Removed 30 rows containing missing values (geom_point). 
Warning messages:
1: Removed 22 rows containing missing values (stat_summary). 
2: Removed 22 rows containing missing values (geom_point). 
Warning messages:
1: Removed 36 rows containing missing values (stat_summary). 
2: Removed 36 rows containing missing values (geom_point). 
Warning messages:
1: Removed 36 rows containing missing values (stat_summary). 
2: Removed 36 rows containing missing values (geom_point). 
Warning messages:
1: Removed 25 rows containing missing values (stat_summary). 
2: Removed 25 rows containing missing values (geom_point). 

I searched online trying to figure out the reason however I couldn't. Most posts suggest there are null values in my dataset. Nothing is missing from my dataset, so I can't see why R would simply assume that some stuff is actually missing.

thank you

Upvotes: 1

Views: 4726

Answers (1)

Jens
Jens

Reputation: 2449

It seems that while you are modifying your initial data, you are messing it up.

if you do not write

data$alg <- factor(data$alg)
data$n <- strtoi(data$n)
data$insTime <- strtoi(data$insTime)

then the plots work out nicely.

see, the structure of the data already tells you that everything is fine:

 > str(data)
 data.frame':   60 obs. of  8 variables:
 $ alg    : Factor w/ 3 levels "aheap","fibheap",..: 1 3 2 1 3 2 1 3 2 1 ...
 $ n      : int  2 2 2 4 4 4 8 8 8 16 ...
 $ insTime: num  408 867 1332 400 1031 ...
 $ decTime: num  359 738 1079 411 856 ...
 $ delTime: num  325 750 1242 416 931 ...
 $ insComp: num  0.9 1.5 2.5 1.9 3.5 6.5 5.8 11.6 18.6 12 ...
 $ decComp: num  0.5 1.1 5.1 1.7 3.6 11.6 3 7 23 11.6 ...
 $ delComp: num  0 0 1 3.6 7.6 14.8 16.8 38 67.6 57 ...

and your summary does not show any NAs:

 > summary(data)
      alg           n              insTime             decTime            delTime             insComp       
 aheap  :20   Min.   :      2   Min.   :      400   Min.   :     359   Min.   :3.250e+02   Min.   :      1  
 fibheap:20   1st Qu.:     56   1st Qu.:     4518   1st Qu.:    3262   1st Qu.:8.420e+03   1st Qu.:     87  
 pheap  :20   Median :   1536   Median :   110041   Median :   67643   Median :2.743e+05   Median :   3095  
              Mean   : 104858   Mean   :  8304522   Mean   : 5866098   Mean   :9.325e+07   Mean   : 258807  
              3rd Qu.:  40960   3rd Qu.:  2416198   3rd Qu.: 1556492   3rd Qu.:1.132e+07   3rd Qu.:  92170  
              Max.   :1048576   Max.   :142359000   Max.   :88428500   Max.   :2.088e+09   Max.   :3735370  
    decComp           delComp         
 Min.   :      0   Min.   :        0  
 1st Qu.:     89   1st Qu.:      608  
 Median :   2790   Median :    46142  
 Mean   : 226980   Mean   :  7884811  
 3rd Qu.:  75944   3rd Qu.:  2085385  
 Max.   :3983010   Max.   :138010000  

after using strtoi you create NAs !

> data$decTime <- strtoi(data$decTime)
> summary(data)
      alg           n              insTime             decTime            delTime             insComp       
 aheap  :20   Min.   :      2   Min.   :     2175   Min.   :     498   Min.   :3.250e+02   Min.   :      1  
 fibheap:20   1st Qu.:     56   1st Qu.:   222651   1st Qu.:  264344   1st Qu.:8.420e+03   1st Qu.:     87  
 pheap  :20   Median :   1536   Median :  1545575   Median : 1596015   Median :2.743e+05   Median :   3095  
              Mean   : 104858   Mean   : 14642987   Mean   :11713536   Mean   :9.325e+07   Mean   : 258807  
              3rd Qu.:  40960   3rd Qu.: 10317432   3rd Qu.: 9105678   3rd Qu.:1.132e+07   3rd Qu.:  92170  
              Max.   :1048576   Max.   :142359000   Max.   :88428500   Max.   :2.088e+09   Max.   :3735370  
                                NA's   :26          NA's   :30                                              
    decComp           delComp         
 Min.   :      0   Min.   :        0  
 1st Qu.:     89   1st Qu.:      608  
 Median :   2790   Median :    46142  
 Mean   : 226980   Mean   :  7884811  
 3rd Qu.:  75944   3rd Qu.:  2085385  
 Max.   :3983010   Max.   :138010000 

Hope that helps?

Upvotes: 3

Related Questions