Reputation: 65

How can I separate one column into two in R so that the all capital letter words are in one column?

I have a one column like this:

x <- c('WV West Virginia','FL Florida','CA California','SC South Carolina')

# [1] WV West Virginia                  FL Florida                        
# [3] CA California                     SC South Carolina

How can I separate the abbreviation from the whole state name. And I want to give the two new columns two different headers. I think I can only solve this by separating the all upper letter words away.

Upvotes: 1

Answers (6)

Tyler Rinker

Reputation: 109874

Here's a data.table/ gsub approach:

x <- c('WV West Virginia','FL Florida','CA California','SC South Carolina')

data.table::data.table(x)[, 
    abb := gsub("(^[A-Z]{2})( .+)", "\\1", x)][, 
    state := gsub("(^[A-Z]{2})( .+)", "\\2", x)][]

##                    x abb           state
## 1:  WV West Virginia  WV   West Virginia
## 2:        FL Florida  FL         Florida
## 3:     CA California  CA      California
## 4: SC South Carolina  SC  South Carolina

Upvotes: 0

akrun

Reputation: 887148

Based on @rawr's comment, we could split 'x' at white space that follows the first two characters, i.e. showed by the regex lookaround ((?<=^.{2})). The output will be a list, which we rbind, convert to data.frame and then cbind with the original vector 'x'.

 cbind(x, as.data.frame(do.call(rbind,strsplit(x, '(?<=^.{2})\\s+', perl=TRUE)),
                    stringsAsFactors=FALSE))
 #                x V1             V2
 #1  WV West Virginia WV  West Virginia
 #2        FL Florida FL        Florida
 #3     CA California CA     California
 #4 SC South Carolina SC South Carolina

Or instead of the regex lookaround, we could use stri_split with n=2 and split at whitespace.

 library(stringi)
 cbind(x,as.data.frame(do.call(rbind,stri_split(x, regex='\\s+', n=2))))

Upvotes: 2

Pierre L

Reputation: 28441

With tidyr we can use separate to expand the column into two while specifying the new names. The argument extra=merge limits the output to the given columns. The separator will default to non-alpha-numerics:

library(tidyr)
separate(df, x, c("Abb", "State"), extra="merge")
#  Abb          State
#1  WV  West Virginia
#2  FL        Florida
#3  CA     California
#4  SC South Carolina

Data

x = c('WV West Virginia', 'FL Florida','CA California', 'SC South Carolina')

Upvotes: 4

Frank

Reputation: 66819

Use the state.* constants that come with the base datasets package

DF = data.frame(raw=c("WV West Virginia","FL Florida","CA California","SC South Carolina"))

DF$state.abbr <- substr(DF$raw, 1, 2)
DF$state.name <- state.name[ match(DF$state.abbr, state.abb) ]

#                 raw state.abbr     state.name
# 1  WV West Virginia         WV  West Virginia
# 2        FL Florida         FL        Florida
# 3     CA California         CA     California
# 4 SC South Carolina         SC South Carolina

This way, you can afford to have typos or other oddities in the state names.

Upvotes: 3

Heroka

Reputation: 13139

Two approaches without external packages:

Approach 1: you could use substring in combination with nchar.

dat <-data.frame(raw=c("WV West Virginia","FL Florida", "CA California","SC South Carolina"),
                 stringsAsFactors=F)


dat$code <- substr(dat$raw,1,2)
dat$state <- substr(dat$raw, 4, nchar(dat$raw))

> dat
                raw code          state
1  WV West Virginia   WV  West Virginia
2        FL Florida   FL        Florida
3     CA California   CA     California
4 SC South Carolina   SC South Carolina

Approach two: you could use regular expressions to replace parts of your strings:

##approach two: regex
dat$code <- sub(" .+","",dat$raw)
dat$state <- sub("[A-Z]{2} ","",dat$raw)

Upvotes: 3

Ansjovis86

Reputation: 1545

Use the reshape2 package.

    library(reshape2)
    x <- rbind('WV West Virginia','FL Florida','CA California','SC South Carolina')
    colsplit(x," ",c("Code","State"))

Output:

  Code          State
1   WV  West Virginia
2   FL        Florida
3   CA     California
4   SC South Carolina

Upvotes: 2

How can I separate one column into two in R so that the all capital letter words are in one column?

Answers (6)

Related Questions