user1533277
user1533277

Reputation: 105

Converting text file into data frame in R

My raw data is in a text file with no particular delimiters between the values, like so:

101  10.08  S   A  05OCT93 GOLDEN GATE BRIDGE  4110   6548   6404   55930

Applying read.table in R creates a data frame with only one variable per row, whereas I would like a data frame with 10 variables per row (one for each of the 10 values). How can I achieve this if there is no delimiter in the text file?

Upvotes: 2

Views: 9907

Answers (3)

G. Grothendieck
G. Grothendieck

Reputation: 269586

We assume that each field consist of non-spaces except for field 6 which may have embedded spaces.

Create test file

Lines <- "101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930
"
cat(Lines, file = "myfile.txt")

Run. Read in the file using readLines producing L. Then using gsubfn in the gsubfn package insert the character defined by sep between the fields producing g. Finally read the text in g using read.table to create a data frame:

library(gsubfn)
L <- readLines("myfile.txt")

sep <- ";"  # choose any character not in the file

pat <- "(\\S+) (\\S+) (\\S+) (\\S+) (\\S+) (\\S.*\\S) (\\S+) (\\S+) (\\S+) (\\S+)"
pat <- gsub(" ", "\\s+", pat) # can omit if there is only 1 space between fields
g <- gsubfn(pat, ... ~ paste(..., sep = sep), L)

read.table(text = g, sep = sep)

Output. The result of the last line is:

   V1    V2 V3 V4      V5                 V6   V7   V8   V9  V10
1 101 10.08  S  A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010
2 101 10.08  S  A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 1010

Upvotes: 2

IRTFM
IRTFM

Reputation: 263342

The other possibility is that this is a fixed width format file. We would get a better understanding of this possibility if you posted several lines:

require(foreign)
txt2 <- "101  10.08  S   A  05OCT93 GOLDEN GATE BRIDGE  4110   6548   6404   55930"
read.fwf(file=textConnection(txt2), c(4,6,3,4,9,20,6,8,8,8))
   V1    V2  V3   V4        V5                   V6   V7   V8   V9   V10
1 101 10.08   S    A   05OCT93  GOLDEN GATE BRIDGE  4110 6548 6404 55930

Upvotes: 1

IRTFM
IRTFM

Reputation: 263342

Are you sure about there only being ten columns?

> read.table(text="101 10.08 S A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930")
   V1    V2 V3 V4      V5     V6   V7     V8   V9  V10  V11   V12
1 101 10.08  S  A 05OCT93 GOLDEN GATE BRIDGE 4110 6548 6404 55930

Upvotes: 1

Related Questions