Reputation: 1245
I have a data frame where I am looking to separate a column (var) into 2 columns by the first ".". The values in var currently have 2 periods. I would like to call the new names "First" and "Second. I have added my data frame as well as a screen shot of it. At the end I show my attempt.
df = structure(list(Date.Dates = c("42370", "42373", "42374", "42375",
"42376", "42377"), var = c("SHOP US Equity.30DAY_IMPVOL_100.0%MNY_DF",
"SHOP US Equity.30DAY_IMPVOL_100.0%MNY_DF", "SHOP US Equity.30DAY_IMPVOL_100.0%MNY_DF",
"SHOP US Equity.30DAY_IMPVOL_100.0%MNY_DF", "SHOP US Equity.30DAY_IMPVOL_100.0%MNY_DF",
"SHOP US Equity.30DAY_IMPVOL_100.0%MNY_DF"), val = c("59.835999999999999",
"67.208600000000004", "61.522599999999997", "63.298900000000003",
"66.243200000000002", "69.282899999999998")), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
library(tidyverse)
df %>%
gather(var, val, -Date.Dates) %>%
separate(var, c("First", "Second", sep = "/^([^.]+)/"))
We should have a new column called "First" that holds the value "SHOP US Equity". A second column called "Second" that contains the value "30DAY_IMPVOL..."
Thank you
Upvotes: 3
Views: 3959
Reputation: 887531
We can use extract
to capture the characters that are not a .
([^.]+
) from the start (^
) of the string as a group, followed by the .
(\\.
) and the rest of the characters in second group for the "Second" column
library(dplyr)
library(tidyr)
df %>%
extract(var, into = c("First", "Second"), "^([^.]+)\\.(.*)")
Or just use [.]
as sep
. It would split into two columns with a warning as there are multiple .
and this will split at the first occurence of .
df %>%
separate(var, into = c("First", "Second"), "[.]")
# A tibble: 6 x 4
# Date.Dates First Second val
# <chr> <chr> <chr> <chr>
#1 42370 SHOP US Equity 30DAY_IMPVOL_100 59.835999999999999
#2 42373 SHOP US Equity 30DAY_IMPVOL_100 67.208600000000004
#3 42374 SHOP US Equity 30DAY_IMPVOL_100 61.522599999999997
#4 42375 SHOP US Equity 30DAY_IMPVOL_100 63.298900000000003
#5 42376 SHOP US Equity 30DAY_IMPVOL_100 66.243200000000002
#6 42377 SHOP US Equity 30DAY_IMPVOL_100 69.282899999999998
If we don't need the warningss, another option is to use a regex lookaround
df %>%
separate(var, into = c("First", "Second"), "(?<=[a-z])\\.(?=[0-9])")
Or using base R
df[c("First", "Second")] <- read.csv(text = sub("\\.", ",", df$var), header = FALSE)
Upvotes: 10
Reputation: 389155
We can use str_split
/str_split_fixed
from stringr
where we can specify in how many parts we want to split the string (n
). Since here we want only 2 columns we can specify n = 2
.
stringr::str_split_fixed(df$var, "\\.", 2) %>%
as.data.frame() %>%
setNames(c('First', 'Second'))
# First Second
#1 SHOP US Equity 30DAY_IMPVOL_100.0%MNY_DF
#2 SHOP US Equity 30DAY_IMPVOL_100.0%MNY_DF
#3 SHOP US Equity 30DAY_IMPVOL_100.0%MNY_DF
#4 SHOP US Equity 30DAY_IMPVOL_100.0%MNY_DF
#5 SHOP US Equity 30DAY_IMPVOL_100.0%MNY_DF
#6 SHOP US Equity 30DAY_IMPVOL_100.0%MNY_DF
Upvotes: 3
Reputation: 79288
You need to use the extra
parameter as shown below
df %>%
separate(var, c("First","second"), "\\.",extra = "merge")
# A tibble: 6 x 4
Date.Dates First second val
<chr> <chr> <chr> <chr>
1 42370 SHOP US Equity 30DAY_IMPVOL_100.0%MNY_DF 59.835999999999999
2 42373 SHOP US Equity 30DAY_IMPVOL_100.0%MNY_DF 67.208600000000004
3 42374 SHOP US Equity 30DAY_IMPVOL_100.0%MNY_DF 61.522599999999997
4 42375 SHOP US Equity 30DAY_IMPVOL_100.0%MNY_DF 63.298900000000003
5 42376 SHOP US Equity 30DAY_IMPVOL_100.0%MNY_DF 66.243200000000002
6 42377 SHOP US Equity 30DAY_IMPVOL_100.0%MNY_DF 69.282899999999998
Upvotes: 2