Sciolism Apparently
Sciolism Apparently

Reputation: 355

Using an if/then condition with the |> pipe character

I need to extract the last names of several thousand people. The names are either two or three words long, depending on whether there is a suffix or not. My attack is to count the number of words in each row, then execute a different separate() function depending on how many words there are. The following code does not work but shows my thinking:

customers = data.frame(names=c("Jack Quinn III", "David Powell", "Carrie Green",
           "Steven Miller, Jr.", "Christine Powers", "Amanda Ramirez"))

customers |> 
  mutate(names_count = str_count(names, "\\w+")) |>
  {
  if(names_count == 2,
     separate(name, c("first_name", "last_name") ),
     separate(name, c("first_name", "last_name", "suffix") )
  )
  }

This code cannot possibly work because I'm missing the ability to interpret the error messages. In fact, I'm not sure if the commas are needed in the if statement because there are apparently functions that use both.

My thought was that I could get the names split into columns by doing

df |> 
  mutate() to count words |> 
  separate() to split columns based on count

but I can't get even the simplest if statement to work.

Upvotes: 0

Views: 50

Answers (3)

akrun
akrun

Reputation: 887058

Using str_extract

library(dplyr)
library(stringr)
 customers %>%
   mutate(last_name = str_extract(names, "^[A-Za-z]+\\s+([A-Za-z]+)", group = 1))

-output

              names last_name
1     Jack Quinn III     Quinn
2       David Powell    Powell
3       Carrie Green     Green
4 Steven Miller, Jr.    Miller
5   Christine Powers    Powers
6     Amanda Ramirez   Ramirez

Upvotes: 1

Jilber Urbina
Jilber Urbina

Reputation: 61154

You can remove if

customers %>% 
  separate(names, into = c("first_name", "last_name", "suffix"), sep=" ") %>% 
  select(last_name)

If you want to avoid extra packages, you can use R base sub + regex:

> sub("[A-Za-z]+\\s+([A-Za-z]+)\\s?.*", "\\1", customers$names)
[1] "Quinn"   "Powell"  "Green"   "Miller"  "Powers"  "Ramirez"

Upvotes: 0

harre
harre

Reputation: 7287

We could use word from stringr instead:

library(stringr)
library(dplyr)

customers |>
    mutate(last_name = word(names, 2))

Output:

               names last_name
1     Jack Quinn III     Quinn
2       David Powell    Powell
3       Carrie Green     Green
4 Steven Miller, Jr.   Miller,
5   Christine Powers    Powers
6     Amanda Ramirez   Ramirez

Upvotes: 1

Related Questions