Reputation: 185

Splitting one cell into multiple columns in R

So this is what one of my column looks like :

Infos
NAME: ANGELA SURNAME:SMITH AGE:22 CITY: LA
NAME: ANDREW SURNAME: D'ONOFRIO AGE:47 CITY: NYC

I'd like to create four columns :

NAME	SURNAME	AGE	CITY
ANGELA	SMITH	22	LA
ANDREW	D'ONOFRIO	47	NYC

I read that we can use "separate" from tidyverse, and this what i tried.

library(tidyr)
library(tidyverse)

df <- infos %>% separate(Infos, c("NAME", "SURNAME","AGE","CITY"))

But this is the output :

NAME	SURNAME	AGE	CITY
NAME	ANGELA	SURNAME	SMITH
NAME	ANDREW	SURNAME	D'ONOFRIO

Then i'd like to understand how to make R knows what it has to separate. Maybe that this exact topic have been treated here before (but i didn't find it) so feel free to redirect me if necessary !

Upvotes: 3

Answers (5)

Ronak Shah

Reputation: 389325

Base R option using strcapture -

strcapture('NAME:\\s*(.*)\\s*SURNAME:\\s*(.*)\\s*AGE:\\s*(.*)\\s*CITY:\\s*(.*)', 
           infos$Infos, proto = list(NAME = character(), 
           SURNAME = character(), AGE = numeric(), CITY = character()))

#    NAME    SURNAME  AGE CITY
#1 ANGELA      SMITH   22   LA
#2 ANDREW  D'ONOFRIO   47  NYC

Upvotes: 2

G. Grothendieck

Reputation: 270298

1) extract Use extract with the pattern shown. The test data did not have any spaces within the contents of the fields but even if it did this should work.

library(dplyr)
library(tidyr)

pat <- "NAME: *(.*) SURNAME: *(.*) AGE: *(.*) CITY: *(.*)"
dat %>% 
  extract(Infos, c("NAME", "SURNAME", "AGE", "CITY"), pat, convert = TRUE)
##     NAME   SURNAME AGE CITY
## 1 ANGELA     SMITH  22   LA
## 2 ANDREW D'ONOFRIO  47  NYC

2) Base R Alternately using only base R we get this general solution which will continue to work even if the number of columns or their names change. This should also work if there are spaces in the contents of the fields. It works by converting Infos into dcf format followed by read.dcf.

dat |>
  with(gsub("(\\w+:)", "\n\\1", Infos)) |>
  textConnection() |>
  read.dcf() |>
  as.data.frame() |>
  type.convert(as.is = TRUE)
##     NAME   SURNAME AGE CITY
## 1 ANGELA     SMITH  22   LA
## 2 ANDREW D'ONOFRIO  47  NYC

Note

dat in reproducible form:

dat <-
structure(list(Infos = c("NAME: ANGELA SURNAME:SMITH AGE:22 CITY: LA", 
"NAME: ANDREW SURNAME: D'ONOFRIO AGE:47 CITY: NYC")), class = "data.frame", row.names = c(NA, 
-2L))

Upvotes: 6

AnilGoyal

Reputation: 26238

One more strategy

df <- structure(list(Infos = c("NAME: ANGELA SURNAME:SMITH AGE:22 CITY: LA", 
                           "NAME: ANDREW SURNAME: D'ONOFRIO AGE:47 CITY: NYC")), class = "data.frame", row.names = c(NA, 
                                                                                                                     -2L))
library(tidyverse)

df %>%
  mutate(Infos = gsub('\\:\\s*', ':', Infos)) %>%
  separate_rows(Infos, sep = '\\s') %>%
  separate(Infos, into = c('N', 'V'), sep = ':') %>%
  pivot_wider(names_from = N, values_from = V, values_fn = list) %>%
  unnest(everything())
#> # A tibble: 2 x 4
#>   NAME   SURNAME   AGE   CITY 
#>   <chr>  <chr>     <chr> <chr>
#> 1 ANGELA SMITH     22    LA   
#> 2 ANDREW D'ONOFRIO 47    NYC

^{Created on 2021-07-15 by the reprex package (v2.0.0)}

Upvotes: 2

TarJae

Reputation: 79286

Here is another solution using str_squish, str_replace_all and separate

library(dplyr)
library(stringr)
df %>% 
  mutate(Infos = str_squish(str_replace_all(Infos, ":", " "))) %>% 
  separate(Infos, c("helper1", "Name", "helper2", "Surname", "helper3", "Age", "helper4","City"), sep = " ") %>%
  select(-starts_with("helper"))

Output:

  Name   Surname   Age   City 
  <chr>  <chr>     <chr> <chr>
1 ANGELA SMITH     22    LA   
2 ANDREW D'ONOFRIO 47    NYC

Upvotes: 1

M.Viking

Reputation: 5418

You can insert dummy columns, then delete them.

tibble(dat=c("NAME: ANGELA SURNAME:SMITH AGE:22 CITY: LA", 
             "NAME: ANDREW SURNAME: DONOFRIO AGE:47 CITY: NYC")) %>% 
    separate(dat, c("DEL1", "NAME", "DEL2", "SURNAME", "DEL3", "AGE", "DEL4", "CITY")) %>% 
    select(-DEL1, -DEL2, -DEL3, -DEL4)

 NAME   SURNAME  AGE   CITY 
 ANGELA SMITH    22    LA   
 ANDREW DONOFRIO 47    NYC

Upvotes: 0

Splitting one cell into multiple columns in R

Answers (5)

Note

Related Questions