Reputation: 3
I need to import a txt file with the following pattern:
"A" "B" "C" "D" "E" "F" "G" "H" "I""1" 1 "7-201-2012-30" "201" "2011" 0 NA 0 14190 "01-01-1970""2" 2 "7-201-2012-101" "201" "2011" 0 NA 0 14190 "01-01-1970""3" 3 "7-203-2031-19" "203" "2032" 0 NA 0 14190 "01-01-1970""4" 4 "7-203-2031-23" "203" "2032" 0 NA 0 14190 "01-01-1970""5" 5 "7-203-2031-26" "203" "2032" 0 NA 0 14190 "01-01-1970""6" 6 "7-201-2012-57" "201" "2011" 0 NA 0 14190 "01-01-1970""7" 7 "7-201-2012-58" "201" "2011" 0 NA 0 14190 "01-01-1970""8" 8 "7-201-2012-64" "201" "2011" 0 NA 0 14190 "01-01-1970""9" 9 "7-201-2012-67" "201" "2011" 0 NA 0 14190 "01-01-1970""10" 10 "7-201-2012-74" "201" "2011" 0 NA 0 14190 "01-01-1970""11" 11 "7-201-2012-77" "201" "2011" 0 NA 0 14190 "01-01-1970""12" 12 "7-201-2012-78" "201" "2011" 0 NA 0 14190 "01-01-1970""13" 13 "7-201-2012-80" "201" "2011" 0 NA 0 14190 "01-01-1970""14" 14 "7-201-2012-85" "201" "2011" 0 NA 0 14190 "01-01-1970""15"
This is not the full file, but the rest continues in this pattern. Every space or tab indicates a new column until the string "1"("2", "3", ..., "15"). After that a new row should start. I'm rather unexperienced in programming with R but I found a similar problem in this forum where the strings were separated with backslashes. This is how they solved it:
data = paste(scan("Einzelteil_T35.txt",what="character"),collapse='') ## Read the file
dmat = matrix(strsplit(data,"\\\\")[[1]],ncol=15,byrow=T) ## Convert it to a matrix
dmat[,15] = gsub("\".*[0-9]\"","",dmat[,15]) ## Remove the next line number from the values of the last column
colnames(dmat)=dmat[1,] ## Take first line as names
dmat = dmat[-1,] ## Remove first line (as it contained the names)
df = as.data.frame(dmat)
df
Unfortunately I did not manage to adjust the code in a way that it works for my txt file. If you could help me adjust the code or show me a complete different approach that works, I would be very thankful.
Upvotes: 0
Views: 51
Reputation: 206197
It just seems like you have 10 columns here that are repeated. You can just form these into a matrix. No need to bother with the collapsing.
data <- scan("Einzelteil_T35.txt",what="character")
mm <- matrix(data, ncol=10, byrow=TRUE)
And then if you want, you can turn that into a data.frame with a bit of futzing
colnames <- mm[1,]
coldata <- mm[-1,]
dd <- data.frame(lapply(split(coldata, col(coldata)), type.convert))
names(dd) <- colnames
We use type.convert
to go from strings into integers or doubles where appropriate. Then we set the names using the first row of the data.
A B C D E F G H I X1
1 1 7-201-2012-30 201 2011 0 NA 0 14190 01-01-1970 2
2 2 7-201-2012-101 201 2011 0 NA 0 14190 01-01-1970 3
3 3 7-203-2031-19 203 2032 0 NA 0 14190 01-01-1970 4
4 4 7-203-2031-23 203 2032 0 NA 0 14190 01-01-1970 5
5 5 7-203-2031-26 203 2032 0 NA 0 14190 01-01-1970 6
6 6 7-201-2012-57 201 2011 0 NA 0 14190 01-01-1970 7
7 7 7-201-2012-58 201 2011 0 NA 0 14190 01-01-1970 8
8 8 7-201-2012-64 201 2011 0 NA 0 14190 01-01-1970 9
9 9 7-201-2012-67 201 2011 0 NA 0 14190 01-01-1970 10
10 10 7-201-2012-74 201 2011 0 NA 0 14190 01-01-1970 11
11 11 7-201-2012-77 201 2011 0 NA 0 14190 01-01-1970 12
12 12 7-201-2012-78 201 2011 0 NA 0 14190 01-01-1970 13
13 13 7-201-2012-80 201 2011 0 NA 0 14190 01-01-1970 14
14 14 7-201-2012-85 201 2011 0 NA 0 14190 01-01-1970 15
Upvotes: 1