Reputation: 105
My raw data is in a text file with no particular delimiters between the values, like so:
101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
Applying read.table in R creates a data frame with only one variable per row, whereas I would like a data frame with 10 variables per row (one for each of the 10 values). How can I achieve this if there is no delimiter in the text file?
Upvotes: 2
Views: 9907
Reputation: 269586
We assume that each field consist of non-spaces except for field 6 which may have embedded spaces.
Create test file
Lines <- "101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
"
cat(Lines, file = "myfile.txt")
Run. Read in the file using readLines
producing L
. Then using gsubfn
in the gsubfn package insert the character defined by sep
between the fields producing g
.
Finally read the text in g
using read.table
to create a data frame:
library(gsubfn)
L <- readLines("myfile.txt")
sep <- ";" # choose any character not in the file
pat <- "(\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S.*\\S) (\\S+) (\\S+) (\\S+) (\\S+)"
pat <- gsub(" ", "\\s+", pat) # can omit if there is only 1 space between fields
g <- gsubfn(pat, ... ~ paste(..., sep = sep), L)
read.table(text = g, sep = sep)
Output. The result of the last line is:
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010
2 101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010
Upvotes: 2
Reputation: 263342
The other possibility is that this is a fixed width format file. We would get a better understanding of this possibility if you posted several lines:
require(foreign)
txt2 <- "101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930"
read.fwf(file=textConnection(txt2), c(4,6,3,4,9,20,6,8,8,8))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10
1 101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
Upvotes: 1
Reputation: 263342
Are you sure about there only being ten columns?
> read.table(text="101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930")
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
1 101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
Upvotes: 1