Jennifer Boylan
Jennifer Boylan

Reputation: 123

R: Round the contents within each row so that the row total is equal to a number I specify

I have 170 rows of numbers with decimals that need to be rounded to whole numbers. However, the rows total has to equal a number I specify.

As a very basic illustration, let’s say I have a matrix (1x4) with cell contents (1.2, 3.4, 7.7, 5.3). But let’s say that these numbers represent individuals so I need to round them to whole numbers, such that the group populations are equal to a total population of 18 individuals. If I simply round the matrix contents, which gives me (1, 3, 8, 5), my total population is 17 and I need it to equal 18 (see R commands below).

m <- c(1.2, 3.4, 7.7, 5.3)

m.2 <- round(m)

m.2 [1] 1 3 8 5

sum(m.2) [1] 17

After the numbers are rounded, I need R to then choose the next number that was closest to rounding up (i.e. 3.4) and round it to 4 instead of 3.

This would give me a matrix of (1, 4, 8, 5) = 18.

Dr. John Fox had helped me out with a simple recursive function to solve the problem:

Round <- function(x, target){
 r.x <- round(x)
 diff.x <- round(x) - x
 if ((s <- sum(r.x)) == target) return(r.x)
 else if (s > target) {
     select <- seq(along=x)[diff.x > 0]
     which <- which.max(diff.x[select])
     x[select[which]] <- r.x[select[which]] - 1
     Round(x, target)
 }
 else{
     select <- seq(along=x)[diff.x < 0]
     which <- which.min(diff.x[select])
     x[select[which]] <- r.x[select[which]] + 1
     Round(x, target)
  }
 }

This is very useful for individual rows. But I have 170 rows in my dataset. So that means repeating a process like this (see below) 170 times:

paste(STATA[['b']], collapse=", ")

B <- c(46.8310012817383, 19.9720001220703, 265.837005615234, 95.0400009155273, 6.88700008392334, 190.768997192383, 22.7269992828369, 764.453002929688, 53.0299987792969, 333.329010009766, 55.0960006713867, 84.0210037231445, 28.2369995117188, 2207.27099609375, 86.7760009765625, 50045.46875, 103.304000854492, 413.217987060547, 4.13199996948242, 2.75500011444092, 183.88200378418, 65.4260025024414, 0.689000010490417, 2248.59204101562, 0, 1.37699997425079, 16.5289993286133, 4.13199996948242, 4.13199996948242, 2.75500011444092, 4.13199996948242, 1.37699997425079, 0, 39.9440002441406, 2.75500011444092, 28.2369995117188, 0, 0, 5.51000022888184, 0, 48.8969993591309, 17.9060001373291, 485.531005859375, 1.37699997425079, 59.9169998168945, 221.759994506836, 28.2369995117188, 4.13199996948242, 65.4260025024414, 11.0190000534058, 38.5670013427734, 3.44300007820129, 8.95300006866455, 2.75500011444092, 23.4160003662109, 4.13199996948242, 50.5750015258789, 11.7080001831055, 19.2830009460449, 48.8969993591309, 0, 13.7740001678467, 92.9739990234375)

varB <- (Round(B, 58701))

ROUND2012$varB <- varB

^In this case, I had used the transpose of my dataset in Excel because I found it easier to attach columns to datasets in R as compared to attaching rows. But ideally I wouldn't have to do this and rows would be my territories and columns are group identity population data. Here, 'b' is the name of the column I am calling and 58701 is the population total that the numbers need to add up to after they are rounded.

In short, I'm looking for a function that is helpful for an entire dataset as opposed to individual rows. Ideally I'd be able to call the columns with the numbers to be rounded as well as call the column with the population totals that I need the rounded numbers to equal to.

Updated Info

As a more illustrative example. Let's say I have two racial groups in my population.

B

     race1 race2 total

place1  1.2  2.1  3.4

place2  3.4  3.6  7.0

place3  7.7  0.8  8.5

place4  5.3  1.4  6.7

I need these numbers to equal my total registered voters population. The totals are 3.4, 7.0, 8.5, 6.7, but I need the contents within each place row to be rounded such that my place(1-4) totals are 4.0, 7.0, 8.0, and 7.0. So that means for place1, I need the contents to be rounded so that 1.2 becomes 2.0 and 2.1 becomes 2.0. Equals 4.0, my registered voter population. For place2, the total is already at 7 so we're okay. For place3 7.7 would become 7.0 and 0.8 would become 1, giving me 8 in total. Finally for place4, I would need 5.3 to be rounded to 5 and 1.4 to be rounded to 2.0, giving me 7 in total. What I want is:

B

     race1 race2 total

place1  2.0  2.0  4.0

place2  3.0  4.0  7.0

place3  7.0  1.0  8.0

place4  5.0  2.0  7.0

Currently the round function pasted above allows me to call one series of numbers at a time, and manually entering in what total they need to be rounded to. But I am looking for a function that could do this all simultaneously. I want to call all the race columns to be rounded, and call a column containing all the necessary population totals. (note: in practice I had taken the transpose of the matrix in excel and re-imported it back into R because, as a fairly new R user, I found that attaching new columns to the dataset was easier than attaching new rows. But I absolutely do not need to do that step and, indeed, would prefer not to.)

Upvotes: 5

Views: 3172

Answers (3)

GKi
GKi

Reputation: 39657

An alternative way to round values that the total is equal to a given number which works also for the case shown in the follow up question.

You can define if the adjustment is done on:

  • closest numbers
  • largest numbers
  • randomly distributed

and also choose the number of decimal places.

#Round to given total
#x..numeric vector
#target..sum of rounded x, if not given target = round(sum(x), digits)
#digits..number of decimal places
#closest..Make adjustment by changing closest number
#ref..reference level to calculate probability of adjustment, if ref==NA the probability of an adjustment is equal for all values of x
#random..should the adjustment be done stochastic or randomly
roundt <- function(x, target=NA, digits = 0, closest=TRUE, ref=0, random=FALSE) {
  if(is.na(target)) {target <- round(sum(x), digits)}
  if(all(x == 0)) {
    if(target == 0) {return(x)}
    x <- x + 1
  }
  xr <- round(x, digits)
  if(target == sum(xr)) {return(xr)}
  if(is.na(ref)) {
    wgt <- rep(1/length(x), length(x))
  } else {
    if(closest) {
      tt <- (x - xr) * sign(target - sum(xr)) + 10^-digits / 2
      wgt <- tt / sum(tt)
    } else {wgt <- abs(x-ref)/sum(abs(x-ref))}
  }
  if(random) {adj <- table(sample(factor(1:length(x)), size=abs(target - sum(xr))*10^digits, replace = T, prob=wgt))*sign(target - sum(xr))*10^-digits
  } else {adj <- diff(c(0,round(cumsum((target - sum(xr)) * wgt), digits)))}
  xr + adj
}

dat <- read.table(text="
race1 race2 total
1.2  2.1  4
3.4  3.6  7
7.7  0.8  8
5.3  1.4  7
3.4  3.6  5
7.7  0.8  12
-5  5  1
0    0    3
0    0    0
", header=T)

apply(dat, 1, function(x) roundt(x[1:2], x[3])) #Default round to target
apply(dat[1:6,], 1, function(x) roundt(x[1:2]*x[3]/sum(x[1:2]))) #Preadjust to target by multiplication
apply(dat, 1, function(x) roundt(x[1:2] + (x[3]-sum(x[1:2]))/2)) #Preadjust to target by addition
apply(dat, 1, function(x) roundt(x[1:2], x[3], cl=F)) #Prefer adjustment on large numbers
apply(dat, 1, function(x) roundt(x[1:2], x[3], ref=NA)) #Give all values the same probability of adjustment
apply(dat, 1, function(x) roundt(x[1:2], x[3], dig=1)) #Use one digit
apply(dat, 1, function(x) roundt(x[1:2], x[3], dig=1, random=TRUE)) #Make the adjustment by random sampling

Upvotes: 0

rawr
rawr

Reputation: 20811

There are several ways you could do this, but taking my comment from above:

Round <- function(x, target) {
  r.x <- round(x)
  diff.x <- round(x) - x
  if ((s <- sum(r.x)) == target) {
    return(r.x)
  } else if (s > target) {
    select <- seq(along=x)[diff.x > 0]
    which <- which.max(diff.x[select])
    x[select[which]] <- r.x[select[which]] - 1
    Round(x, target)
  } else {
    select <- seq(along=x)[diff.x < 0]
    which <- which.min(diff.x[select])
    x[select[which]] <- r.x[select[which]] + 1
    Round(x, target)
  }
}

dat <- read.table(header = TRUE, row.names = paste0('place', 1:4),
                  text="race1 race2 total
                        1.2  2.1  3.4
                        3.4  3.6  7.0
                        7.7  0.8  8.5
                        5.3  1.4  6.7")

totals <- c(4.0, 7.0, 8.0, 7.0)

The two examples simply perform the Round on each row using a 1-1 mapping from the two columns of dat with each corresponding value in totals

lapply returns a list, so to transform the output back into a matrix/data frame, we rbind everything back together.

do.call(rbind, lapply(1:nrow(dat), function(x) Round(dat[x, -3], totals[x])))

#        race1 race2
# place1     2     2
# place2     3     4
# place3     7     1
# place4     5     2

the output of apply is transposed to what you want, so we t the result

dat[3] <- totals

t(apply(dat, 1, function(x) Round(x[1:2], x[3])))

#        race1 race2
# place1     2     2
# place2     3     4
# place3     7     1
# place4     5     2

Alternatively, you could probably come up with something more clever using Map/mapply or Vectorize the Round to avoid these loops, but it doesn't seem like your data is very large.

Upvotes: 2

Kai Sun
Kai Sun

Reputation: 1

I came up with a relatively straight-forward but lazy method to solve your problem. The basic idea is to: 1. Check how many additional numbers you need to round for the second time; 2. Dynamically sort out which number should be preferably rounded for the second time.

I used the dataset "B" you quoted above with a rounded sum of 58701; and I set the designated round output of 58711.

raw <- B
round <- round(B)
data <- data.frame(raw, round)
calc_sum = sum(data$round)
desig_sum = 58711
data$residual = abs(data$raw - data$round)
data$above = ifelse(data$round > data$raw, 1, 0)
data$round2 = 0
data1 <- data[order(data$residual),]

if (calc_sum < desig_sum) {
    diff = desig_sum - calc_sum
    count = 0
    while (count < diff) {
        for (i in 1:nrow(data1)) {
            data_tmp <- subset(data1, round2 == 0 & above == 0)
# Finding out which the next number is for its second rounding
            if (data1[i,4] == 0 & data1[i,3] == max(data_tmp$residual)) {
                data1[i,5] = data1[i,2] + 1
                count = count + 1
            } else {
                count = count
            }
        }
    }
}

data2 <- data1[order(as.numeric(rownames(data1))),]
# Reverting back to the original order

data2$output = 0    
for (i in 1:nrow(data2)) {
    if (data2[i,5] != 0) {
        data2[i,6] = data2[i,5]
    } else {
        data2[i,6] = data2[i,1]
    }
}


data_final = data2[,6]

I have not yet come up with the codes where calc_sum > desig_sum, but in that case, the codes should not differ much from the ones above.

Also, if there are not enough numbers to round to your designated number (for example, in the case above, desig_sum = 5), the codes won't work.

Upvotes: 0

Related Questions