user2699186
user2699186

Reputation:

Read only n-th column of a text file which has no header with R and sqldf

I have a similiar problem like this question: selecting every Nth column in using SQLDF or read.csv.sql

I want to read some columns of large files (table of 150rows, >500,000 columns, space separated, filled with numeric data and only a 32 bit system available). This file has no header, therefore the code in the thread above didn't work and I decided to write a new post.

Do you have an idea to solve this problem?

I thought about something like that, but any results with fread or read.table are also ok:

MyConnection <- file("path/file.txt")
df<-sqldf("select column 1 100 1000 235612 from MyConnection",file.format = list(header=F,sep=" "))

Upvotes: 1

Views: 483

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

You can use substr to specify the start and end position of the columns you want to read in if they are fixed width:

x <- tempfile()
cat("12345", "67890", "09876", "54321", sep = "\n", file = x)

myfile <- file(x)

sqldf("select substr(V1, 1, 1) var1, substr(V1, 3, 5) var2 from myfile")
#   var1 var2
# 1    1  345
# 2    6  890
# 3    9   76
# 4    5  321

See this blog post for some more examples. The "select" statement can easily be constructed with paste if you know the details about the column starting positions and widths.

Upvotes: 1

Related Questions