Diya
Diya

Reputation: 1

Splitting data with one column into more columns

I a fresh R user and I can't find how to properly spit my data into 5 columns (name, surname, title, area_code and phone_number).

df=read.table("school.txt")

df <- data.frame(
    stringsAsFactors = FALSE,
    
V1= c("Lebel, MarieStudent 1st year216 132-3789",
           "Lachance, PaulTeacher 2nd year567 990-345 ext 1811",
           "Smith, AnnieStudent 1st yearNot available")

I was able to separate the data into 2 columns to get the names by doing this:

df1= data.frame(str_split_fixed(df$V1, ",", 2)) 

Thank you in advance

Upvotes: 0

Views: 34

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388797

You can use regex to separate out the data into different columns. Using tidyr::extract :

tidyr::extract(df, V1,
       c("surname", "name", "title", "year","area_code",  "phone_number"), 
       '(\\w+),\\s([A-Za-z]+)(Teacher|Student)\\s(\\w+\\syear)(\\d+)?\\s?(.*)?')

#   surname  name   title     year area_code     phone_number
#1    Lebel Marie Student 1st year       216         132-3789
#2 Lachance  Paul Teacher 2nd year       567 990-345 ext 1811
#3    Smith Annie Student 1st year              Not available

Upvotes: 1

jsv
jsv

Reputation: 740

Most likely possible to do it in fewer steps...

df %>% 
  separate(V1,into=c("name","step1"),sep=",") %>% 
  mutate(title = case_when(str_detect(step1,pattern = "Student") ~ "Student",
                           str_detect(step1,pattern = "Teacher") ~ "Teacher",
                           TRUE ~ NA_character_
                           )) %>% 
  mutate(step2 = str_replace(step1,title,replacement = "")) %>% 
  separate(step2,into=c("surname","step3"),"[0-9]+(st|nd|rd|th)+",remove = FALSE) %>% 
  mutate(step3 = str_replace(step2,surname,"")) %>% 
  mutate(year = str_extract(step3,"[0-9](st|nd|rd|th) year")) %>% 
  mutate(step4 = str_replace(step3,year,"")) %>% 
  mutate(area_code = str_extract(step4,"[0-9]+\\s")) %>% 
  mutate(phone_number = str_replace(step4,area_code,"")) %>% 
  dplyr::select(-step1,-step2,-step3,-step4)

Upvotes: 0

Related Questions