Reputation: 323
I've got some data output generated by windows software. It's a tab delimited rtf file, but upon export it is trasferred into one long column of data. I'd like to use R to get it back into a dataframe to export as tab delimited so I can easily analyse the data:
Basically there are no delimiters but I know there should be, say, 7 columns in the finished file, so I'd like the dataframe to start a new row after 7 entries have gone in.
To illustrate, the output looks like this:
>Sample
>Location
>Test
>Mean
>Stdev
>Status
>Points
>A
>A1
>XX1
>0.034541
>0.04596
>UN
>0
>B
>A1
>XX2
>0
>0
>TA
>1
I'd like to parse it into this:
>Sample Location Test Mean Stdev Status Points >A A1 XX1 0.03 0.04 UN 0 >B A1 XX2 0 0 TA 1
I wasn't entirely sure how to search for this but the closest example of what I want to do is in this example:
writing contents of list to one file in R
But I couldn't work out how to apply the answers to my data, particularly the set width options (which I didnt fully follow, but could be about column number?), capture.output, and the two list elements which I don't think applies here.
Also open to BASH or awk solutions.
Upvotes: 1
Views: 265
Reputation: 121568
Similar to the other solution, I keep it because there is a further treatment :
matrix(gsub('>','',dat$V1),ncol=7,byrow=T)
colnames(dat) <- dat[1,]
> dat <- dat[-1,]
> dat
Sample Location Test Mean Stdev Status Points
[1,] "A" "A1" "XX1" "0.034541" "0.04596" "UN" "0"
[2,] "B" "A1" "XX2" "0" "0" "TA" "1"
Upvotes: 1
Reputation: 60462
Here's a complete set of steps to take you through.
Read the file in as normal
d = read.table("/tmp/tmp.txt", header=FALSE, stringsAsFactors=FALSE)
This gives a data frame with one column.
Next we alter the number of rows and columns:
d = unlist(d)
dim(d) = c(7, nrow(d)/7)
d = t(d)
##Or
d = matrix(as.matrix(d), ncol=7, byrow=TRUE)
Then we remove the ">" symbol:
d = gsub(">", "", d)
here we replace ">" with nothing ""
Then sort out the column headings:
colnames(d) = d[1,]
d = d[-1,]
Upvotes: 4
Reputation: 18323
I think you want something like this:
# Read data
x<-read.table(textConnection('>Sample
>Location
>Test
>Mean
>Stdev
>Status
>Points
>A
>A1
>XX1
>0.034541
>0.04596
>UN
>0
>B
>A1
>XX2
>0
>0
>TA
>1'),stringsAsFactors=FALSE)
# Convert to matrix
mat<-matrix(as.matrix(x),ncol=7,byrow=TRUE)
# Dump as tab delimited
write.table(mat,'file.tab',sep='\t')
Upvotes: 2