Split a string of names and transpose

Question

I have a list of names (famous directors) that is in format of First, (possible middle), and Last Name which I need to rearrange to have Last Name, First (possible middle). I can't just split all of these by the first space, or even second space since some last names actually have two words and some have middle names and or middle initials that would stay following the first name.

Here is the dput for the list I'm working with:

> dput(directors.names)
c("Frank Darabont,", "Francis Ford Coppola,", "Francis Ford Coppola,", 
"Christopher Nolan,", "Sidney Lumet,", "Steven Spielberg,", "Peter Jackson,", 
"Quentin Tarantino,", "Sergio Leone,", "Peter Jackson,", "David Fincher,", 
"Robert Zemeckis,", "Christopher Nolan,", "Peter Jackson,", "Irvin Kershner,", 
"Lana Wachowski,", "Martin Scorsese,", "Milos Forman,", "Akira Kurosawa,", 
"David Fincher,", "Jonathan Demme,", "Fernando Meirelles,", "Roberto Benigni,", 
"Frank Capra,", "Steven Spielberg,", "George Lucas,", "Christopher Nolan,", 
"Hayao Miyazaki,", "Frank Darabont,", "Bong Joon Ho,", "Luc Besson,", 
"Masaki Kobayashi,", "Roman Polanski,", "James Cameron,", "Robert Zemeckis,", 
"Bryan Singer,", "Alfred Hitchcock,", "Roger Allers,", "Charles Chaplin,", 
"Tony Kaye,", "Isao Takahata,", "Charles Chaplin,", "Damien Chazelle,", 
"Ridley Scott,", "Martin Scorsese,", "Olivier Nakache,", "Christopher Nolan,", 
"Michael Curtiz,", "Sergio Leone,", "Alfred Hitchcock,", "Giuseppe Tornatore,", 
"Ridley Scott,", "Francis Ford Coppola,", "Christopher Nolan,", 
"Steven Spielberg,", "Charles Chaplin,", "Quentin Tarantino,", 
"Florian Henckel von Donnersmarck,", "Stanley Kubrick,", "Billy Wilder,", 
"Andrew Stanton,", "Anthony Russo,", "Billy Wilder,", "Stanley Kubrick,", 
"Bob Persichetti,", "Stanley Kubrick,", "Hayao Miyazaki,", "Park Chan-Wook,", 
"Todd Phillips,", "Makoto Shinkai,", "Lee Unkrich,", "Christopher Nolan,", 
"James Cameron,", "Sergio Leone,", "Anthony Russo,", "Nadine Labaki,", 
"Wolfgang Petersen,", "Akira Kurosawa,", "Rajkumar Hirani,", 
"John Lasseter,", "Sam Mendes,", "Milos Forman,", "Mel Gibson,", 
"Quentin Tarantino,", "Thomas Kail,", "Gus Van Sant,", "Richard Marquand,", 
"Stanley Kubrick,", "Quentin Tarantino,", "Elem Klimov,", "Fritz Lang,", 
"Aamir Khan,", "Alfred Hitchcock,", "Orson Welles,", "Thomas Vinterberg,", 
"Darren Aronofsky,", "Stanley Donen,", "Alfred Hitchcock,", "Michel Gondry,", 
"Akira Kurosawa,", "Vittorio De Sica,", "David Lean,", "Charles Chaplin,", 
"Stanley Kubrick,", "Nitesh Tiwari,", "Billy Wilder,", "Denis Villeneuve,", 
"Florian Zeller,", "Fritz Lang,", "Billy Wilder,", "Stanley Kubrick,", 
"Martin Scorsese,", "Asghar Farhadi,", "George Roy Hill,", "Brian De Palma,", 
"Satyajit Ray,", "Guy Ritchie,", "Sam Mendes,", "Jean-Pierre Jeunet,", 
"Robert Mulligan,", "Lee Unkrich,", "Sergio Leone,", "Pete Docter,", 
"Steven Spielberg,", "Michael Mann,", "Curtis Hanson,", "T.J. Gnanavel,", 
"Akira Kurosawa,", "John McTiernan,", "Akira Kurosawa,", "Akira Kurosawa,", 
"Peter Farrelly,", "Oliver Hirschbiegel,", "Terry Gilliam,", 
"Joseph L. Mankiewicz,", "Billy Wilder,", "Christopher Nolan,", 
"Clint Eastwood,", "Majid Majidi,", "Hayao Miyazaki,", "Martin Scorsese,", 
"Stanley Kramer,", "John Sturges,", "Paul Thomas Anderson,", 
"Martin Scorsese,", "John Huston,", "Guillermo del Toro,", "Ron Howard,", 
"Juan José Campanella,", "Martin Scorsese,", "Akira Kurosawa,", 
"Roman Polanski,", "Hayao Miyazaki,", "Guy Ritchie,", "Martin Scorsese,", 
"Ethan Coen,", "Charles Chaplin,", "Alfred Hitchcock,", "John Carpenter,", 
"Ingmar Bergman,", "Martin McDonagh,", "Sergio Pablos,", "David Lynch,", 
"M. Night Shyamalan,", "Ingmar Bergman,", "Peter Weir,", "Carol Reed,", 
"Steven Spielberg,", "Denis Villeneuve,", "Bong Joon Ho,", "James McTeigue,", 
"Ridley Scott,", "Danny Boyle,", "Pete Docter,", "David Lean,", 
"Joel Coen,", "Gavin O'Connor,", "Andrew Stanton,", "Quentin Tarantino,", 
"Victor Fleming,", "Yasujirô Ozu,", "Elia Kazan,", "Cagan Irmak,", 
"Damián Szifron,", "Andrei Tarkovsky,", "Michael Cimino,", "Denis Villeneuve,", 
"Costa-Gavras,", "Wes Anderson,", "Buster Keaton,", "Clyde Bruckman,", 
"Clint Eastwood,", "Ingmar Bergman,", "Richard Linklater,", "Adam Elliot,", 
"Steven Spielberg,", "Frank Capra,", "Jim Sheridan,", "Stanley Kubrick,", 
"Lenny Abrahamson,", "David Fincher,", "Mel Gibson,", "Carl Theodor Dreyer,", 
"Sriram Raghavan,", "James Mangold,", "Steve McQueen,", "Ernst Lubitsch,", 
"Joel Coen,", "Peter Weir,", "Ingmar Bergman,", "Dean DeBlois,", 
"George Miller,", "William Wyler,", "David Yates,", "Clint Eastwood,", 
"Henri-Georges Clouzot,", "Park Chan-Wook,", "Rob Reiner,", "Sidney Lumet,", 
"James Mangold,", "Anurag Kashyap,", "Stuart Rosenberg,", "Lasse Hallström,", 
"Mathieu Kassovitz,", "François Truffaut,", "Naoko Yamada,", 
"Oliver Stone,", "Tom McCarthy,", "Pete Docter,", "Alfred Hitchcock,", 
"Terry Jones,", "Terry George,", "Kar-Wai Wong,", "Yavuz Turgul,", 
"Ron Howard,", "Sean Penn,", "John G. Avildsen,", "Alejandro G. Iñárritu,", 
"Hayao Miyazaki,", "Andrei Tarkovsky,", "Frank Capra,", "Richard Linklater,", 
"Ingmar Bergman,", "Hideaki Anno,", "Gillo Pontecorvo,", "Federico Fellini,", 
"Rob Reiner,", "Wim Wenders,", "Krzysztof Kieslowski,", "Ram Kumar,"
)

Some of the tricky examples, I would need to split "John G. Avildsen" after the G., but then "Bong Joon Ho" after the first space, and even more so, "Florian Henckel von Donnersmarck" after the 2nd space (just to point out a couple).

I've added a comma to the end of all strings so that I can then transpose the strings and have it return Last Name, First (possible middle) format.

I went through my list and found all the situations where there is something that would need to remain with the last name portion to try and those ones split first, but it isn't splitting where I need it to, it's just splitting each string into it's own index.

Here is what I have tried most recently:

directors.names <- paste0(directors.1, ",")
directors.names <- strsplit(directors.names, "[[:space:]]+('von'|'Ford'|'Joon'|'De'|'del'|'Van')[[:space:]]+", perl = TRUE)

Once these are split and transposed correctly, the duplicates need to be removed to return a list that can be alphabetically sorted by last name and each row showing Last Name, First Name (MI or Middle Name).

Split a string of names and transpose

Answers (1)

Related Questions