Reputation: 13123
This question perhaps has been answered earlier, but I did not see an answer.
I have a data set that consists of numbers and missing values. One row is a percentage. Below is a small set of fake data where AA, BB and CC are the column names. The third row in this data set is the percentage.
AA BB CC
234 432 78
1980 3452 2323
91.1 90 93.3
34 123 45
In this case, when I read the data set AA and CC are numeric and BB is integer. I guess somewhere 90.0 was rounded to 90. If I do not specify that BB is numeric could this cause problems with basic arithmetic?
I believe that if dd = 1 and ee = 2 and both are integer then the C language says dd / ee = 0, while R says dd / ee = 0.5.
Below is a series of simple mathematical operations that all seem to suggest answers in R are not changed regardless of whether the data are numeric or integer. Nevertheless, I keep thinking that it would be smart to specify that all variables are numeric when reading the data. Using Google I have found an example or two where the data type did seem to make a difference, but not below.
aa <- c(1,2,3,4,5,6,7)
bb <- 2
str(aa)
str(bb)
cc <- as.integer(aa)
dd <- as.integer(bb)
str(cc)
str(dd)
aa/bb
cc/dd
aa/dd
cc/bb
ee <- aa * aa
str(ee)
sum(ee/2)
ff <- cc * cc
str(ff)
sum(ff/2)
gg <- 4.14
hh <- ((aa * aa) * gg) / 2
hh
ii <- ((cc * cc) * gg) / 2
ii
jj <- (aa * aa) / gg
jj
kk <- (cc * cc) / gg
kk
jj == kk
mm <- as.integer(1)
nn <- as.integer(2)
mm/nn
I guess I am hoping for reassurance that this is not likely an issue with simple math, but I suspect it can. I keep thinking there is a fundamental rule of programming here, but I am not sure what that is. (I am aware of the concept of double precision.)
Thanks for any advice with what is surely a basic issue.
Upvotes: 7
Views: 17047
Reputation: 263499
Division using the /
operator will always return a "numeric", i.e. the equivalent of a C "double". The numerators and denominators are first coerced to numeric and then the division is done. If you want to use integer division you can use %/%
. If you want to create an integer then you can use trunc
or floor
or you can use round(x , 0)
or you can use as.integer. The first second and fourth of those options are equivalent. The round function will still return "numeric" even though the printed representation appears integer. I do not think you need to worry as long as you will be happy with "double"/"numeric" results. Heck, we even allow division by 0.
Your 'aa' variable was classed as "numeric" despite being entered as a bunch of integers but had you used:
aa <- 1:8 # sequences are integer class.
It sounds as though you will not be too surprised by FAQ 7.31
Upvotes: 7