ZZ123
ZZ123

Reputation: 25

Separating a string according to fixed lengths in R, to create columns

I have a list of strings that I would like to turn into a dataframe. I would like to delimit each string according to some fixed length, for example, let's say my list looks like:

text = c("ABC      ABC BROWNIES COMPANY            1/31/2009",
         "BCD      BCD BROWNIES COMPANY            1/31/2009")

and I want to turn it into the dataframe that would result if I used:

FINAL <- data.frame(rbind(c("ABC", "ABC BROWNIES COMPANY","1/31/2009"),c("BCD", "BCD BROWNIES COMPANY","1/31/2009")),stringsAsFactors = F)

colnames(FINAL) = c("Ticker","Company","Date")

FINAL

Basically I want to introduce some sort of fixed-length delimiting to separate the items in each element of "text." I don't think I can use strsplit because I don't really have one character on which to split (spaces won't work because some of my entries contain spaces, and there are uneven spaces from "Ticker" to "Company" and "Company" to "Date").

Any help would be much appreciated!

Upvotes: 2

Views: 501

Answers (3)

Rich Scriven
Rich Scriven

Reputation: 99371

Since you mention fixed length delimiting, maybe give read.fwf a try.

read.fwf(textConnection(text), widths = c(3, 21, 13), 
         col.names = c("Ticker", "Company", "Date"))
#   Ticker               Company       Date
# 1    ABC  ABC BROWNIES COMPANY  1/31/2009
# 2    BCD  BCD BROWNIES COMPANY  1/31/2009

You can mess around with the middle 21 value to get it to work on all the data.

Another possibility is to split on three or more spaces.

data.frame(do.call(rbind, strsplit(text, " {3,}")))
#    X1                   X2        X3
# 1 ABC ABC BROWNIES COMPANY 1/31/2009
# 2 BCD BCD BROWNIES COMPANY 1/31/2009

Upvotes: 4

nrussell
nrussell

Reputation: 18612

Possibly str_split_fixed from stringr:

library(stringr)
##
Df <- data.frame(
  str_split_fixed(text,pattern="\\s{2,}",3),
  stringsAsFactors=F)
##
names(Df) <- c("Ticker","Company","Date")
##
> Df
  Ticker              Company      Date
1    ABC ABC BROWNIES COMPANY 1/31/2009
2    BCD BCD BROWNIES COMPANY 1/31/2009
> str(Df)
'data.frame':   2 obs. of  3 variables:
 $ Ticker : chr  "ABC" "BCD"
 $ Company: chr  "ABC BROWNIES COMPANY" "BCD BROWNIES COMPANY"
 $ Date   : chr  "1/31/2009" "1/31/2009"

which assumes that 2 or more consecutive spaces indicates a new column; although this could be adjusted if necessary.

Upvotes: 0

IRTFM
IRTFM

Reputation: 263451

Either use read.fwf or substitute long spans of spaces with a delimiter.

> read.table( text=gsub(" {3,10}", ",", text), sep="," )
   V1                   V2          V3
1 ABC ABC BROWNIES COMPANY   1/31/2009
2 BCD BCD BROWNIES COMPANY   1/31/2009

Upvotes: 1

Related Questions