Reputation: 25
I have a list of strings that I would like to turn into a dataframe. I would like to delimit each string according to some fixed length, for example, let's say my list looks like:
text = c("ABC ABC BROWNIES COMPANY 1/31/2009",
"BCD BCD BROWNIES COMPANY 1/31/2009")
and I want to turn it into the dataframe that would result if I used:
FINAL <- data.frame(rbind(c("ABC", "ABC BROWNIES COMPANY","1/31/2009"),c("BCD", "BCD BROWNIES COMPANY","1/31/2009")),stringsAsFactors = F)
colnames(FINAL) = c("Ticker","Company","Date")
FINAL
Basically I want to introduce some sort of fixed-length delimiting to separate the items in each element of "text." I don't think I can use strsplit because I don't really have one character on which to split (spaces won't work because some of my entries contain spaces, and there are uneven spaces from "Ticker" to "Company" and "Company" to "Date").
Any help would be much appreciated!
Upvotes: 2
Views: 501
Reputation: 99371
Since you mention fixed length delimiting, maybe give read.fwf
a try.
read.fwf(textConnection(text), widths = c(3, 21, 13),
col.names = c("Ticker", "Company", "Date"))
# Ticker Company Date
# 1 ABC ABC BROWNIES COMPANY 1/31/2009
# 2 BCD BCD BROWNIES COMPANY 1/31/2009
You can mess around with the middle 21
value to get it to work on all the data.
Another possibility is to split on three or more spaces.
data.frame(do.call(rbind, strsplit(text, " {3,}")))
# X1 X2 X3
# 1 ABC ABC BROWNIES COMPANY 1/31/2009
# 2 BCD BCD BROWNIES COMPANY 1/31/2009
Upvotes: 4
Reputation: 18612
Possibly str_split_fixed
from stringr
:
library(stringr)
##
Df <- data.frame(
str_split_fixed(text,pattern="\\s{2,}",3),
stringsAsFactors=F)
##
names(Df) <- c("Ticker","Company","Date")
##
> Df
Ticker Company Date
1 ABC ABC BROWNIES COMPANY 1/31/2009
2 BCD BCD BROWNIES COMPANY 1/31/2009
> str(Df)
'data.frame': 2 obs. of 3 variables:
$ Ticker : chr "ABC" "BCD"
$ Company: chr "ABC BROWNIES COMPANY" "BCD BROWNIES COMPANY"
$ Date : chr "1/31/2009" "1/31/2009"
which assumes that 2 or more consecutive spaces indicates a new column; although this could be adjusted if necessary.
Upvotes: 0
Reputation: 263451
Either use read.fwf or substitute long spans of spaces with a delimiter.
> read.table( text=gsub(" {3,10}", ",", text), sep="," )
V1 V2 V3
1 ABC ABC BROWNIES COMPANY 1/31/2009
2 BCD BCD BROWNIES COMPANY 1/31/2009
Upvotes: 1