pssguy
pssguy

Reputation: 3505

String manipulation in R to create dataframe column

I have an R dataframe(df) which includes a factor column, Team

Team
Baltimore Orioles
Kansas City Chiefs
...

I just want to create a new column, nickname, which just refers to the last name

Nickname
Orioles
Chiefs

As a first stage, I have tried splitting the factor like this

df$Nickname <- strsplit(as.character(df$Team), " ")

which produces a list of character fields which I can reference thus

>df$Nickname[1]

[[1]]
[1] "Baltimore" "Orioles"

and

>str(df$Nickname[1])

List of 1
 $ : chr [1:2] "Baltimore" "Orioles"

but then I do not know how to proceed. Trying to get the length

length(df$Nickname[1])

gives 1 - which flummoxes me

Upvotes: 1

Views: 3235

Answers (3)

David
David

Reputation: 104

How about this?

require(plyr)
ldply(df$Nickname)

Upvotes: 0

Andrie
Andrie

Reputation: 179388

Use a regular expression:

text <- c("Baltimore Orioles","Kansas City Chiefs")

gsub("^.*\\s", "", text)
[1] "Orioles" "Chiefs" 

The regex searches for:

  • ^ means the start of the string
  • .* means any character, repeated
  • \\s means a single white space

gsub finds this pattern and replaces it with an empty string, leaving you with the last word of each string.

Upvotes: 7

tim riffe
tim riffe

Reputation: 5691

you just need to unlist the split strings and take the last one

    full <- c("Baltimore Orioles","Kansas City Chiefs")
    getlast <- function(x){
    parts <- unlist(strsplit(x, split = " "))
    parts[length(parts)]
    }
    sapply(full,getlast)
    > Baltimore Orioles Kansas City Chiefs 
    > "Orioles"           "Chiefs" 

Upvotes: 4

Related Questions