Jessica Rapson
Jessica Rapson

Reputation: 3

How to replace specific values in a dataset with randomized numbers?

I have a data column that contains a bunch of ranges as strings (e.g. "2 to 4", "5 to 6", "7 to 8" etc.). I'm trying to create a new column that converts each of these values to a random number within the given range. How can I leverage conditional logic within my function to solve this problem?

I think the function should be something along the lines of:

df<-mutate(df, c2=ifelse(df$c=="2 to 4", sample(2:4, 1, replace=TRUE), "NA"))

Which should produce a new column in my dataset that replaces all the values of "2 to 4" with a random integer between 2 and 4, however, this is not working and replacing every value with "NA".

Ideally, I am trying to do something where the dataset:

df<-c("2 to 4","2 to 4","5 to 6")

Would add a new column:

df<-c2("3","2","5")

Does anyone have any idea how to do this?

Upvotes: 0

Views: 364

Answers (2)

akrun
akrun

Reputation: 886938

We can do this easily with sub. Replace the to with : and evaluate to get the sequence, then get the sample of 1 from it

df$c2 <- sapply(sub(" to ", ":", df$c1), function(x) 
                sample(eval(parse(text = x)), 1))
df
#      c1 c2
#1 2 to 4  4
#2 2 to 4  3
#3 5 to 6  5

Or with gsubfn

library(gsubfn)
as.numeric(gsubfn("(\\d+) to (\\d+)", ~ sample(seq(as.numeric(x), 
        as.numeric(y), by = 1), 1), df$c1))

Or with read.table/Map from base R

sapply(do.call(Map, c(f = `:`, read.csv(text = sub(" to ", ",", df$c1),
         header = FALSE))), sample, 1)

data

df <- structure(list(c1 = c("2 to 4", "2 to 4", "5 to 6")), 
 class = "data.frame", row.names = c(NA, -3L))

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388807

We can split the string on "to" and create a range between the two numbers after converting them to numeric and then use sample to select any one of the number in range.

df$c2 <- sapply(strsplit(df$c1, "\\s+to\\s+"), function(x) {
         vals <- as.integer(x)
         sample(vals[1]:vals[2], 1)
})

df
#      c1 c2
#1 2 to 4  2
#2 2 to 4  3
#3 5 to 6  5

data

df<- data.frame(c1 = c("2 to 4","2 to 4","5 to 6"), stringsAsFactors = FALSE)

Upvotes: 1

Related Questions