Reputation: 313
Good morning everyone, I have read several things about columns splitting with R but I could not find how to fix my case.
I would like to use the 'separate' function from tidyr R package to split columns of a dataframe in two columns each, according to a separator.
I have this data frame:
dat1
AIN5997 AIN7452 AIN8674 AIN9655
001 01/02 02/02 02/02 01/02
002 01/02 01/01 02/02 02/02
003 01/02 01/02 01/01 02/02
004 01/02 01/01 02/02 01/02
005 01/01 01/01 02/02 02/02
006 01/02 01/02 01/01 02/02
...
And I would like to separate each column into two according to "/", and if possible while keeping column names (e.g. : AIN5997 would become AIN5997.1 and AIN5997.2)
I think it is possible with 'separate' but I could not extend the procedure to each column of my frame when trying with an 'apply' (probably because separate works with dataframes and a single column at a time). This must be actually very easy but my R skills are quite bad!
There are many threads explaining how to split one column into two, like this one: Split a column of a data frame to multiple columns
but I cannot find how to expand the procedure for several columns simultaneously.
Thank you very much for your help,
All the best :)
Upvotes: 4
Views: 5320
Reputation: 26238
New function in tidyr
is exactly for this purpose
dt = data.frame(id = 1:2,
AIN5997 = c("01/02", "01/02"),
AIN7452 = c("02/02", NA),
AIN8674 = c("02/02","02/02"), stringsAsFactors = F)
library(tidyr)
dt %>%
separate_wider_delim(-id, delim = "/", names_sep = ".")
#> # A tibble: 2 × 7
#> id AIN5997.1 AIN5997.2 AIN7452.1 AIN7452.2 AIN8674.1 AIN8674.2
#> <int> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 1 01 02 02 02 02 02
#> 2 2 01 02 <NA> <NA> 02 02
Created on 2024-04-21 with reprex v2.1.0
Upvotes: 0
Reputation: 16121
The trick is to create the new names in the right order, so make sure the columns you want to separate are ordered in advance.
The problem with NA
values is that the process cannot split them. So, the trick is to replace them with something you can split. Check this:
library(dplyr)
library(tidyr)
# example dataset
dt = data.frame(id = 1:2,
AIN5997 = c("01/02", "01/02"),
AIN7452 = c("02/02", NA),
AIN8674 = c("02/02","02/02"), stringsAsFactors = F)
# specify columns you want to separate (specify column positions)
input_names = names(dt)[2:4]
# create new names (you want each name twice)
new_names = expand.grid(input_names, 1:2) %>%
unite(v, Var1, Var2, sep=".") %>%
pull(v) %>%
sort()
dt %>%
unite_("v", input_names) %>% # unite columns of interest
mutate(v = gsub("NA", "NA/NA", v)) %>% # replace NAs with something that can be separated
separate(v, new_names, convert = F) # separate elements and give new names
# id AIN5997.1 AIN5997.2 AIN7452.1 AIN7452.2 AIN8674.1 AIN8674.2
# 1 1 01 02 02 02 02 02
# 2 2 01 02 NA NA 02 02
I'm also adding a better solution. It automatically treats NA
values and you don't have to worry about column names and their order.
library(dplyr)
library(tidyr)
library(purrr)
# example dataset
dt = data.frame(id = 1:2,
AIN5997 = c("01/02", "01/02"),
AIN7452 = c("02/02", NA),
AIN8674 = c("02/02","02/02"), stringsAsFactors = F)
# separate a given column of your initial dataset
f = function(x) { dt %>% select_("id", x) %>% separate_(x, paste0(x, c(".1",".2"))) }
names(dt)[2:4] %>% # get names of columns you want to separate
map(f) %>% # apply the function above to each name (will create a list of dataframes)
reduce(left_join, by="id") # join dataframes iteratively
# id AIN5997.1 AIN5997.2 AIN7452.1 AIN7452.2 AIN8674.1 AIN8674.2
# 1 1 01 02 02 02 02 02
# 2 2 01 02 <NA> <NA> 02 02
Upvotes: 3
Reputation: 6768
You can also use tstrsplit()
.
# example dataset
df <- data.frame(AIN5997 = c("01/02", "01/02"),
AIN7452 = c("02/02","01/01"),
AIN8674 = c("02/02","02/02"), stringsAsFactors = F)
df
df2 <- as.data.frame(unlist(lapply(df, data.table::tstrsplit, "/"),
recursive = FALSE))
df2
colnames(df2) # change colnames
colnames(df2) <- paste(substr(colnames(df2), 1, nchar(colnames(df2))-1),
substr(colnames(df2), nchar(colnames(df2)), nchar(colnames(df2))),
sep = ".")
df2
Upvotes: 1