Reputation: 171
I have a dataframe my_df
and I would like to add an additional column, my_new_column
, and populate it with random integer numbers that add up to a given sum.
Here is some reproducible code:
library(dplyr)
library(magrittr)
my_df <- as.data.frame(matrix(nrow = 10, ncol = 2))
colnames(my_df) <- c("Cat", "MarksA")
my_df$Cat <- LETTERS[1:nrow(my_df)]
my_df$MarksA <- sample(1:100, size = nrow(my_df))
In Tidyverse
style, I tried the following:
my_df %<>% mutate(my_new_column=sample(n()))
However, this gives me a column which sums up to an arbitrary number. How can I tweak my code to achieve this task?
Upvotes: 2
Views: 719
Reputation: 1695
Since the sum of all numbers between 1 and n
is equal to n(n + 1)/2
, you may try something like this :
nb <- nrow(my_df)
my_df %<>% mutate(my_new_column = sample(nb * (nb + 1)/2))
Upvotes: 0
Reputation: 2506
Since you didn't specify a specific distribution, would this work? I pulled my answer mostly from this post which has more details and more options: Generate non-negative (or positive) random integers that sum to a fixed value
my_df %>%
mutate(int_sample = rmultinom(n = 1, size = 1000, prob = rep.int(1 / 10, 10)))
Upvotes: 3