Reputation: 983
I have team standings data where a column has entries of the form 'Xth of Y'. I need to convert these to numerical values on a 0 to 1 scale where 1st is 1, last is 0, and the remainder occur on a linear scale. I have considered strsplit(), but don't know what to do about some being '1st', some '2nd', etc. To give an example, my data looks like
x = as.factor(c('2nd of 6', '5th of 5', '4th of 5', '3rd of 5', '5th of 5', '4th of 7'))
Note: '2nd of 6' should convert to 0.8 and not 0.6666667
Upvotes: 2
Views: 114
Reputation: 887068
The OP didn't mention the expected output previously. So, we are changing the output based on the comments on the other post.
df1 <- read.csv(text= gsub("\\D+", ",", x), header = FALSE)
1 - unlist(Map(function(x, y) seq(0, 1, length.out = y)[x], df1$V1, df1$V2))
#[1] 0.80 0.00 0.25 0.50 0.00 0.50
We can use base R
to do this in a single line
1- Reduce(`/`, read.csv(text= gsub("\\D+", ",", x), header = FALSE))
#[1] 0.6666667 0.0000000 0.2000000 0.4000000 0.0000000 0.4285714
Or with strsplit
m1 <- sapply(strsplit(as.character(x), "\\D+"), as.numeric)
1 - m1[1,]/m1[2,]
Or with fread
library(data.table)
fread(text=gsub("\\D+", ",", x))[, 1- Reduce(`/`, .SD)]
#[1] 0.6666667 0.0000000 0.2000000 0.4000000 0.0000000 0.4285714
Or using tidyverse
library(tidyverse)
x %>%
str_replace("\\D+", ",") %>%
tibble(col1 = .) %>%
separate(col1, into = c('col1', 'col2'), convert = TRUE) %>%
reduce(`/`) %>%
-1 *-1
#[1] 0.6666667 0.0000000 0.2000000 0.4000000 0.0000000 0.4285714
Upvotes: 1
Reputation: 388962
We can extract the numbers from the string, split them and create a sequence between 0 and 1 whose length is decided by 2nd number and subset the first number from that sequence.
sapply(strsplit(sub("^(\\d+)(?:st|nd|rd|th) of (\\d+).*", "\\1-\\2", x), "-"),
function(x) 1 - seq(0, 1, length.out = as.integer(x[2]))[as.integer(x[1])])
#[1] 0.80 0.00 0.25 0.50 0.00 0.50
Upvotes: 2