Chrys
Chrys

Reputation: 313

Splitting several columns of a dataframe with 'separate' (tidyr) in R

Good morning everyone, I have read several things about columns splitting with R but I could not find how to fix my case.

I would like to use the 'separate' function from tidyr R package to split columns of a dataframe in two columns each, according to a separator.

I have this data frame:

dat1 AIN5997 AIN7452 AIN8674 AIN9655 001 01/02 02/02 02/02 01/02 002 01/02 01/01 02/02 02/02 003 01/02 01/02 01/01 02/02 004 01/02 01/01 02/02 01/02 005 01/01 01/01 02/02 02/02 006 01/02 01/02 01/01 02/02 ...

And I would like to separate each column into two according to "/", and if possible while keeping column names (e.g. : AIN5997 would become AIN5997.1 and AIN5997.2)

I think it is possible with 'separate' but I could not extend the procedure to each column of my frame when trying with an 'apply' (probably because separate works with dataframes and a single column at a time). This must be actually very easy but my R skills are quite bad!

There are many threads explaining how to split one column into two, like this one: Split a column of a data frame to multiple columns

but I cannot find how to expand the procedure for several columns simultaneously.

Thank you very much for your help,

All the best :)

Upvotes: 4

Views: 5320

Answers (3)

AnilGoyal
AnilGoyal

Reputation: 26238

New function in tidyr is exactly for this purpose

dt = data.frame(id = 1:2,
                AIN5997  = c("01/02", "01/02"),
                AIN7452  = c("02/02", NA),
                AIN8674 = c("02/02","02/02"), stringsAsFactors = F)
library(tidyr)
dt %>% 
  separate_wider_delim(-id, delim = "/", names_sep = ".")
#> # A tibble: 2 × 7
#>      id AIN5997.1 AIN5997.2 AIN7452.1 AIN7452.2 AIN8674.1 AIN8674.2
#>   <int> <chr>     <chr>     <chr>     <chr>     <chr>     <chr>    
#> 1     1 01        02        02        02        02        02       
#> 2     2 01        02        <NA>      <NA>      02        02

Created on 2024-04-21 with reprex v2.1.0

Upvotes: 0

AntoniosK
AntoniosK

Reputation: 16121

The trick is to create the new names in the right order, so make sure the columns you want to separate are ordered in advance.

The problem with NA values is that the process cannot split them. So, the trick is to replace them with something you can split. Check this:

library(dplyr)
library(tidyr)

# example dataset
dt = data.frame(id = 1:2,
                AIN5997  = c("01/02", "01/02"),
                AIN7452  = c("02/02", NA),
                AIN8674 = c("02/02","02/02"), stringsAsFactors = F)

# specify columns you want to separate (specify column positions)
input_names = names(dt)[2:4]

# create new names (you want each name twice)
new_names = expand.grid(input_names, 1:2) %>% 
  unite(v, Var1, Var2, sep=".") %>% 
  pull(v) %>% 
  sort()

dt %>%
  unite_("v", input_names) %>%                  # unite columns of interest
  mutate(v = gsub("NA", "NA/NA", v)) %>%        # replace NAs with something that can be separated
  separate(v, new_names, convert = F)           # separate elements and give new names

#   id AIN5997.1 AIN5997.2 AIN7452.1 AIN7452.2 AIN8674.1 AIN8674.2
# 1  1        01        02        02        02        02        02
# 2  2        01        02        NA        NA        02        02

I'm also adding a better solution. It automatically treats NA values and you don't have to worry about column names and their order.

library(dplyr)
library(tidyr)
library(purrr)

# example dataset
dt = data.frame(id = 1:2,
                AIN5997  = c("01/02", "01/02"),
                AIN7452  = c("02/02", NA),
                AIN8674 = c("02/02","02/02"), stringsAsFactors = F)

# separate a given column of your initial dataset
f = function(x) { dt %>% select_("id", x) %>% separate_(x, paste0(x, c(".1",".2"))) }


names(dt)[2:4] %>%             # get names of columns you want to separate
  map(f) %>%                   # apply the function above to each name (will create a list of dataframes)
  reduce(left_join, by="id")   # join dataframes iteratively

#   id AIN5997.1 AIN5997.2 AIN7452.1 AIN7452.2 AIN8674.1 AIN8674.2
# 1  1        01        02        02        02        02        02
# 2  2        01        02      <NA>      <NA>        02        02

Upvotes: 3

nghauran
nghauran

Reputation: 6768

You can also use tstrsplit().

# example dataset
df <- data.frame(AIN5997  = c("01/02", "01/02"),
                 AIN7452  = c("02/02","01/01"),
                 AIN8674 = c("02/02","02/02"), stringsAsFactors = F)
df
df2 <- as.data.frame(unlist(lapply(df, data.table::tstrsplit, "/"),
                            recursive = FALSE))
df2
colnames(df2) # change colnames
colnames(df2) <- paste(substr(colnames(df2), 1, nchar(colnames(df2))-1),
                       substr(colnames(df2), nchar(colnames(df2)), nchar(colnames(df2))),
                       sep = ".")
df2

Upvotes: 1

Related Questions