Jigar Mehta
Jigar Mehta

Reputation: 23

Split a dataframe in multiple columns in R

My data frame is as follows:

User  
JohnLenon03041965  
RogerFederer12021954  
RickLandsman01041975  

and I am trying to get the output as

Name     Lastname    Birthdate  
John     Lenon       03041965      
Roger    Federer     12021954  
Rick     Landsman    01041975  

I tried the following code:

**a = gsub('([[:upper:]])', ' \\1', df$User)
a <- as.data.frame(a)
library(tidyr)
a <-separate(a, a, into = c("Name", "Last"), sep = " (?=[^ ]+$)")**

I get the following:

Name  Last  
John  Lenon03041965  
Roger Federer12021954  
Rick  Landsman01041975  

I am trying to use the separate condition like (?=[0-9]) but getting error like this:

c <-separate(c, c, into = c("last", "date"), sep = '(?=[0-9])')

Error in if (!after) c(values, x) else if (after >= lengx) c(x, values) else c(x[1L:after], : argument is of length zero

Upvotes: 1

Views: 53

Answers (1)

akrun
akrun

Reputation: 887951

We can use a regex lookaround as sep by specifying either to split between a lower case letter and an upper case ((?<=[a-z])(?=[A-Z])) or (|) between a lower case letter and a number ((?<=[a-z])(?=[0-9]+))

df1 %>%
   separate(User, into = c("Name", "LastName", "Birthdate"),
         sep = "(?<=[a-z])(?=[A-Z])|(?<=[a-z])(?=[0-9]+)")
#   Name LastName Birthdate
#1  John    Lenon  03041965
#2 Roger  Federer  12021954
#3  Rick Landsman  01041975

Or another option is extract to capture characters as a group by placing it inside the brackets ((...)). Here, the 1st capture group matches an upper case letter followed by one or more lower case letters (([A-Z][a-z])) from the start (^) of the string, 2nd captures one or more characters that are not numbers (([^0-9]+)) and in the 3rs, it is the rest of the characters ((.*))

df1 %>% 
    extract(User, into = c("Name", "LastName", "Birthdate"),
           "^([A-Z][a-z]+)([^0-9]+)(.*)")
#   Name LastName Birthdate
#1  John    Lenon  03041965
#2 Roger  Federer  12021954
#3  Rick Landsman  01041975

data

df1 <- structure(list(User = c("JohnLenon03041965", "RogerFederer12021954", 
"RickLandsman01041975")), .Names = "User", class = "data.frame", row.names = c(NA, 
-3L))

Upvotes: 1

Related Questions