Reputation: 303
I am working in R. I have a series of coordinates in decimal degrees, and I would like to sort these coordinates by how many decimal places these numbers have (i.e. I will want to discard coordinates that have too few decimal places).
Is there a function in R that can return the number of decimal places a number has, that I would be able to incorporate into function writing?
Example of input:
AniSom4 -17.23300000 -65.81700
AniSom5 -18.15000000 -63.86700
AniSom6 1.42444444 -75.86972
AniSom7 2.41700000 -76.81700
AniLac9 8.6000000 -71.15000
AniLac5 -0.4000000 -78.00000
I would ideally write a script that would discard AniLac9 and AniLac 5 because those coordinates were not recorded with enough precision. I would like to discard coordinates for which both the longitude and the latitude have fewer than 3 non-zero decimal values.
Upvotes: 30
Views: 40813
Reputation: 1253
Late to the party but here's my vectorised solution.
library(cpp11)
cpp_function('SEXP num_decimals(SEXP x, double tol){
int size = Rf_length(x);
double *p_x = REAL(x);
SEXP out = Rf_protect(Rf_allocVector(INTSXP, size));
int *p_out = INTEGER(out);
for (int i = 0; i < size; ++i){
int n = 0;
double y = p_x[i];
while (std::fabs(y - std::round(y)) >= tol){
y = y * 10.0;
++n;
}
p_out[i] = n;
}
Rf_unprotect(1);
return out;
}')
tol <- sqrt(.Machine$double.eps) * 10
num_decimals(0, tol)
#> [1] 0
num_decimals(1.123, tol)
#> [1] 3
num_decimals(c(0, 1, 1.123, 1.12345678, pi), tol)
#> [1] 0 0 3 8 15
Created on 2023-11-20 with reprex v2.0.2
Upvotes: 1
Reputation: 8863
as.character
uses scientific notation for numbers that are between -1e-4 and 1e-4 but not zero:
> as.character(0.0001)
[1] "1e-04"
You can use format(scientific=F)
instead:
> format(0.0001,scientific=F)
[1] "0.0001"
Then do this:
nchar(sub("^-?\\d*\\.?","",format(x,scientific=F)))
Or in vectorized form:
> nplaces=function(x)sapply(x,function(y)nchar(sub("^-?\\d*\\.?","",format(y,scientific=F))))
> nplaces(c(0,-1,1.1,0.123,1e-8,-1e-8))
[1] 0 0 1 3 8 8
Upvotes: 2
Reputation: 303
I have tested some solutions and I found this one robust to the bugs reported in the others.
countDecimalPlaces <- function(x) {
if ((x %% 1) != 0) {
strs <- strsplit(as.character(format(x, scientific = F)), "\\.")
n <- nchar(strs[[1]][2])
} else {
n <- 0
}
return(n)
}
# example to prove the function with some values
xs <- c(1000.0, 100.0, 10.0, 1.0, 0, 0.1, 0.01, 0.001, 0.0001)
sapply(xs, FUN = countDecimalPlaces)
Upvotes: 4
Reputation: 3045
If someone here needs a vectorized version of the function provided by Gergely Daróczi above:
decimalplaces <- function(x) {
ifelse(abs(x - round(x)) > .Machine$double.eps^0.5,
nchar(sub('^\\d+\\.', '', sub('0+$', '', as.character(x)))),
0)
}
decimalplaces(c(234.1, 3.7500, 1.345, 3e-15))
#> 1 2 3 0
Upvotes: 4
Reputation: 6649
Not sure why this simple approach was not used above (load the pipe from tidyverse/magrittr).
count_decimals = function(x) {
#length zero input
if (length(x) == 0) return(numeric())
#count decimals
x_nchr = x %>% abs() %>% as.character() %>% nchar() %>% as.numeric()
x_int = floor(x) %>% abs() %>% nchar()
x_nchr = x_nchr - 1 - x_int
x_nchr[x_nchr < 0] = 0
x_nchr
}
> #tests
> c(1, 1.1, 1.12, 1.123, 1.1234, 1.1, 1.10, 1.100, 1.1000) %>% count_decimals()
[1] 0 1 2 3 4 1 1 1 1
> c(1.1, 12.1, 123.1, 1234.1, 1234.12, 1234.123, 1234.1234) %>% count_decimals()
[1] 1 1 1 1 2 3 4
> seq(0, 1000, by = 100) %>% count_decimals()
[1] 0 0 0 0 0 0 0 0 0 0 0
> c(100.1234, -100.1234) %>% count_decimals()
[1] 4 4
> c() %>% count_decimals()
numeric(0)
So R does not seem internally to distinguish between getting 1.000
and 1
initially. So if one has a vector input of various decimal numbers, one can see how many digits it initially had (at least) by taking the max value of the number of decimals.
Edited: fixed bugs
Upvotes: 4
Reputation: 1
Vector solution based on daroczig's function (can also deal with dirty columns containing strings and numerics):
decimalplaces_vec <- function(x) {
vector <- c()
for (i in 1:length(x)){
if(!is.na(as.numeric(x[i]))){
if ((as.numeric(x[i]) %% 1) != 0) {
vector <- c(vector, nchar(strsplit(sub('0+$', '', as.character(x[i])), ".", fixed=TRUE)[[1]][[2]]))
}else{
vector <- c(vector, 0)
}
}else{
vector <- c(vector, NA)
}
}
return(max(vector))
}
Upvotes: 0
Reputation: 2425
Another contribution, keeping fully as numeric representations without converting to character:
countdecimals <- function(x)
{
n <- 0
while (!isTRUE(all.equal(floor(x),x)) & n <= 1e6) { x <- x*10; n <- n+1 }
return (n)
}
Upvotes: 1
Reputation: 168
Don't mean to hijack the thread, just posting it here as it might help someone to deal with the task I tried to accomplish with the proposed code.
Unfortunately, even the updated @daroczig's solution didn't work for me to check if a number has less than 8 decimal digits.
@daroczig's code:
decimalplaces <- function(x) {
if (abs(x - round(x)) > .Machine$double.eps^0.5) {
nchar(strsplit(sub('0+$', '', as.character(x)), ".", fixed = TRUE)[[1]][[2]])
} else {
return(0)
}
}
In my case produced the following results
NUMBER / NUMBER OF DECIMAL DIGITS AS PRODUCED BY THE CODE ABOVE
[1] "0.0000437 7"
[1] "0.000195 6"
[1] "0.00025 20"
[1] "0.000193 6"
[1] "0.000115 6"
[1] "0.00012501 8"
[1] "0.00012701 20"
etc.
So far was able to accomplish the required tests with the following clumsy code:
if (abs(x*10^8 - floor(as.numeric(as.character(x*10^8)))) > .Machine$double.eps*10^8)
{
print("The number has more than 8 decimal digits")
}
PS: I might be missing something in regard to not taking the root of the .Machine$double.eps
so please take caution
Upvotes: 1
Reputation: 28632
You could write a small function for the task with ease, e.g.:
decimalplaces <- function(x) {
if ((x %% 1) != 0) {
nchar(strsplit(sub('0+$', '', as.character(x)), ".", fixed=TRUE)[[1]][[2]])
} else {
return(0)
}
}
And run:
> decimalplaces(23.43234525)
[1] 8
> decimalplaces(334.3410000000000000)
[1] 3
> decimalplaces(2.000)
[1] 0
Update (Apr 3, 2018) to address @owen88's report on error due to rounding double precision floating point numbers -- replacing the x %% 1
check:
decimalplaces <- function(x) {
if (abs(x - round(x)) > .Machine$double.eps^0.5) {
nchar(strsplit(sub('0+$', '', as.character(x)), ".", fixed = TRUE)[[1]][[2]])
} else {
return(0)
}
}
Upvotes: 52
Reputation: 3534
Interesting question. Here is another tweak on the above respondents' work, vectorized, and extended to handle the digits on the left of the decimal point. Tested against negative digits, which would give an incorrect result for the previous strsplit()
approach.
If it's desired to only count the ones on the right, the trailingonly
argument can be set to TRUE
.
nd1 <- function(xx,places=15,trailingonly=F) {
xx<-abs(xx);
if(length(xx)>1) {
fn<-sys.function();
return(sapply(xx,fn,places=places,trailingonly=trailingonly))};
if(xx %in% 0:9) return(!trailingonly+0);
mtch0<-round(xx,nds <- 0:places);
out <- nds[match(TRUE,mtch0==xx)];
if(trailingonly) return(out);
mtch1 <- floor(xx*10^-nds);
out + nds[match(TRUE,mtch1==0)]
}
Here is the strsplit()
version.
nd2 <- function(xx,trailingonly=F,...) if(length(xx)>1) {
fn<-sys.function();
return(sapply(xx,fn,trailingonly=trailingonly))
} else {
sum(c(nchar(strsplit(as.character(abs(xx)),'\\.')[[1]][ifelse(trailingonly, 2, T)]),0),na.rm=T);
}
The string version cuts off at 15 digits (actually, not sure why the other one's places argument is off by one... the reason it's exceeded through is that it counts digits in both directions so it could go up to twice the size if the number is sufficiently large). There is probably some formatting option to as.character()
that can give nd2()
an equivalent option to the places
argument of nd1()
.
nd1(c(1.1,-8.5,-5,145,5,10.15,pi,44532456.345243627,0));
# 2 2 1 3 1 4 16 17 1
nd2(c(1.1,-8.5,-5,145,5,10.15,pi,44532456.345243627,0));
# 2 2 1 3 1 4 15 15 1
nd1()
is faster.
rowSums(replicate(10,system.time(replicate(100,nd1(c(1.1,-8.5,-5,145,5,10.15,pi,44532456.345243627,0))))));
rowSums(replicate(10,system.time(replicate(100,nd2(c(1.1,-8.5,-5,145,5,10.15,pi,44532456.345243627,0))))));
Upvotes: 1
Reputation: 4464
For the common application, here's modification of daroczig's code to handle vectors:
decimalplaces <- function(x) {
y = x[!is.na(x)]
if (length(y) == 0) {
return(0)
}
if (any((y %% 1) != 0)) {
info = strsplit(sub('0+$', '', as.character(y)), ".", fixed=TRUE)
info = info[sapply(info, FUN=length) == 2]
dec = nchar(unlist(info))[seq(2, length(info), 2)]
return(max(dec, na.rm=T))
} else {
return(0)
}
}
In general, there can be issues with how a floating point number is stored as binary. Try this:
> sprintf("%1.128f", 0.00000000001)
[1] "0.00000000000999999999999999939458150688409432405023835599422454833984375000000000000000000000000000000000000000000000000000000000"
How many decimals do we now have?
Upvotes: 1
Reputation: 72731
Rollowing up on Roman's suggestion:
num.decimals <- function(x) {
stopifnot(class(x)=="numeric")
x <- sub("0+$","",x)
x <- sub("^.+[.]","",x)
nchar(x)
}
x <- "5.2300000"
num.decimals(x)
If your data isn't guaranteed to be of the proper form, you should do more checking to ensure other characters aren't sneaking in.
Upvotes: 9
Reputation: 6761
Here is one way. It checks the first 20 places after the decimal point, but you can adjust the number 20 if you have something else in mind.
x <- pi
match(TRUE, round(x, 1:20) == x)
Here is another way.
nchar(strsplit(as.character(x), "\\.")[[1]][2])
Upvotes: 16
Reputation: 23550
In [R] there is no difference between 2.30000 and 2.3, both get rounded to 2.3 so the one is not more precise than the other if that is what you want to check. On the other hand if that is not what you meant: If you really want to do this you can use 1) multiply by 10, 2) use floor() function 3) divide by 10 4) check for equality with the original. (However be aware that comparing floats for equality is bad practice, make sure this is really what you want)
Upvotes: 1