Teddy Darwin
Teddy Darwin

Reputation: 3

Import .txt file in R with no line ending

I need to import a txt file with the following pattern:

"A" "B" "C" "D" "E" "F" "G" "H" "I""1"  1   "7-201-2012-30" "201" "2011"    0   NA  0   14190   "01-01-1970""2" 2   "7-201-2012-101"    "201"   "2011"  0   NA  0   14190   "01-01-1970""3" 3   "7-203-2031-19" "203"   "2032"  0   NA  0   14190   "01-01-1970""4" 4   "7-203-2031-23" "203"   "2032"  0   NA  0   14190   "01-01-1970""5" 5   "7-203-2031-26" "203"   "2032"  0   NA  0   14190   "01-01-1970""6" 6   "7-201-2012-57" "201"   "2011"  0   NA  0   14190   "01-01-1970""7" 7   "7-201-2012-58" "201"   "2011"  0   NA  0   14190   "01-01-1970""8" 8   "7-201-2012-64" "201"   "2011"  0   NA  0   14190   "01-01-1970""9" 9   "7-201-2012-67" "201"   "2011"  0   NA  0   14190   "01-01-1970""10"    10  "7-201-2012-74" "201"   "2011"  0   NA  0   14190   "01-01-1970""11"    11  "7-201-2012-77" "201"   "2011"  0   NA  0   14190   "01-01-1970""12"    12  "7-201-2012-78" "201"   "2011"  0   NA  0   14190   "01-01-1970""13"    13  "7-201-2012-80" "201"   "2011"  0   NA  0   14190   "01-01-1970""14"    14  "7-201-2012-85" "201"   "2011"  0   NA  0   14190   "01-01-1970""15"

This is not the full file, but the rest continues in this pattern. Every space or tab indicates a new column until the string "1"("2", "3", ..., "15"). After that a new row should start. I'm rather unexperienced in programming with R but I found a similar problem in this forum where the strings were separated with backslashes. This is how they solved it:

data = paste(scan("Einzelteil_T35.txt",what="character"),collapse='') ## Read the file
dmat = matrix(strsplit(data,"\\\\")[[1]],ncol=15,byrow=T) ## Convert it to a matrix
dmat[,15] = gsub("\".*[0-9]\"","",dmat[,15]) ## Remove the next line number from the values of the last column
colnames(dmat)=dmat[1,] ## Take first line as names
dmat = dmat[-1,] ## Remove first line (as it contained the names)
df = as.data.frame(dmat)
df

Unfortunately I did not manage to adjust the code in a way that it works for my txt file. If you could help me adjust the code or show me a complete different approach that works, I would be very thankful.

Upvotes: 0

Views: 51

Answers (1)

MrFlick
MrFlick

Reputation: 206197

It just seems like you have 10 columns here that are repeated. You can just form these into a matrix. No need to bother with the collapsing.

data <- scan("Einzelteil_T35.txt",what="character")
mm <- matrix(data, ncol=10, byrow=TRUE)

And then if you want, you can turn that into a data.frame with a bit of futzing

colnames <- mm[1,]
coldata <- mm[-1,]
dd <- data.frame(lapply(split(coldata, col(coldata)), type.convert))
names(dd) <- colnames

We use type.convert to go from strings into integers or doubles where appropriate. Then we set the names using the first row of the data.

    A              B   C    D E  F G     H          I X1
1   1  7-201-2012-30 201 2011 0 NA 0 14190 01-01-1970  2
2   2 7-201-2012-101 201 2011 0 NA 0 14190 01-01-1970  3
3   3  7-203-2031-19 203 2032 0 NA 0 14190 01-01-1970  4
4   4  7-203-2031-23 203 2032 0 NA 0 14190 01-01-1970  5
5   5  7-203-2031-26 203 2032 0 NA 0 14190 01-01-1970  6
6   6  7-201-2012-57 201 2011 0 NA 0 14190 01-01-1970  7
7   7  7-201-2012-58 201 2011 0 NA 0 14190 01-01-1970  8
8   8  7-201-2012-64 201 2011 0 NA 0 14190 01-01-1970  9
9   9  7-201-2012-67 201 2011 0 NA 0 14190 01-01-1970 10
10 10  7-201-2012-74 201 2011 0 NA 0 14190 01-01-1970 11
11 11  7-201-2012-77 201 2011 0 NA 0 14190 01-01-1970 12
12 12  7-201-2012-78 201 2011 0 NA 0 14190 01-01-1970 13
13 13  7-201-2012-80 201 2011 0 NA 0 14190 01-01-1970 14
14 14  7-201-2012-85 201 2011 0 NA 0 14190 01-01-1970 15

Upvotes: 1

Related Questions