Usman Khaliq
Usman Khaliq

Reputation: 363

How to sort a string in R that contains comma separated numbers

I have the following dataframe:

library(tidyverse)
library(tibble)
data_frame <-
  tribble(
    ~a,                                                 ~b,
    "2,29,3,30,31,4,5,2,28,29,3,30,4,5",                "x",
    "12,13,14,15,18,19,20,12,13,14,15,18,19,20,21" ,     "y"
  )

I want to create a new column to this dataframe, called c, that can show the strings in a arranged in numerical ascending order. C should ideally look like the following:

library(tidyverse)
library(tibble)
data_frame <-
  tribble(
    ~a,                                                 ~b,            ~c,
    "2,29,3,30,31,4,5,2,28,29,3,30,4,5",                "x",           "2,2,3,3,4,4,5,5,28,29,29,30,30,31"
    "12,13,14,15,18,19,20,12,13,14,15,18,19,20,21" ,     "y",          "12,12,13,13,14,14,15,15,18,18,19,19,20,20,21"
  )

How can I achieve this? I have tried using str_sort from the stringr package, and also the mixedsort function from the gtools library. Thank you.

Upvotes: 2

Views: 1530

Answers (3)

akrun
akrun

Reputation: 887118

In tidyverse, we can use separate_rows while converting the type to split the 'a' by the delimiter, then arrange the 'b', 'a', columns, grouped by 'b', paste the elements of 'a' in a new column and bind it with the original dataset

library(dplyr)
library(tidyr)
 data_frame %>% 
    # // split a by the delimiter and expand the rows
    separate_rows(a, convert = TRUE) %>%
     # // order the columns
     arrange(b, a) %>%
     # // grouped by b
     group_by(b) %>%
     # paste the elements of a
     # toString => paste(..., collapse=", ") 
     summarise(c = toString(a)) %>%
     # // select the column c
     select(c) %>%
     # // bind with the original dataset
     bind_cols(data_frame, .) 
# A tibble: 2 x 3
#  a                                            b     c                                                         
#  <chr>                                        <chr> <chr>                                                     
#1 2,29,3,30,31,4,5,2,28,29,3,30,4,5            x     2, 2, 3, 3, 4, 4, 5, 5, 28, 29, 29, 30, 30, 31            
#2 12,13,14,15,18,19,20,12,13,14,15,18,19,20,21 y     12, 12, 13, 13, 14, 14, 15, 15, 18, 18, 19, 19, 20, 20, 21

Or using strsplit with map. We split the string 'a' with ,, loop over the list with map, convert to numeric, sort and then paste it to a single string

library(purrr)
data_frame %>% 
   mutate(c = map_chr(strsplit(a, ","), ~ 
                  toString(sort(as.numeric(.x)))))
# A tibble: 2 x 3
#  a                                            b     c                                                         
#  <chr>                                        <chr> <chr>                                                     
#1 2,29,3,30,31,4,5,2,28,29,3,30,4,5            x     2, 2, 3, 3, 4, 4, 5, 5, 28, 29, 29, 30, 30, 31          
#2 12,13,14,15,18,19,20,12,13,14,15,18,19,20,21 y     12, 12, 13, 13, 14, 14, 15, 15, 18, 18, 19, 19, 20, 20, 21           

Upvotes: 3

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521259

For a base R option, we can use strsplit to generate a vector of numbers, then sort that vector and finally collapse back to a character string.

x <- "2,29,3,30,31,4,5,2,28,29,3,30,4,5"
nums <- sort(as.numeric(strsplit(x, ",")[[1]]))
output <- paste(nums, collapse=",")
output

[1] "2,2,3,3,4,4,5,5,28,29,29,30,30,31"

For a version which operates on entire data frames:

nums <- c("2,29,3,30,31,4,5,2,28,29,3,30,4,5",
          "12,13,14,15,18,19,20,12,13,14,15,18,19,20,21")
df <- data.frame(v1=nums, stringsAsFactors=FALSE)
df$v2 <- lapply(df$v1, function(y) paste(sort(as.numeric(strsplit(y, ",")[[1]])), collapse=","))
df

                                            v1
1            2,29,3,30,31,4,5,2,28,29,3,30,4,5
2 12,13,14,15,18,19,20,12,13,14,15,18,19,20,21
                                            v2
1            2,2,3,3,4,4,5,5,28,29,29,30,30,31   # sorted
2 12,12,13,13,14,14,15,15,18,18,19,19,20,20,21   # sorted

Upvotes: 1

Ronak Shah
Ronak Shah

Reputation: 388982

Using mixedsort from gtools :

data_frame$c <- sapply(strsplit(data_frame$a, ','), function(x) 
                       toString(gtools::mixedsort(x)))

This can be written in tidyverse as :

library(tidyverse)
data_frame %>%
   mutate(c = str_split(a, ','), 
          c = map_chr(c, ~toString(gtools::mixedsort(.x))))

Upvotes: 1

Related Questions