jksl
jksl

Reputation: 323

Single column of data to multiple columned dataframe with no delimiter

I've got some data output generated by windows software. It's a tab delimited rtf file, but upon export it is trasferred into one long column of data. I'd like to use R to get it back into a dataframe to export as tab delimited so I can easily analyse the data:

Basically there are no delimiters but I know there should be, say, 7 columns in the finished file, so I'd like the dataframe to start a new row after 7 entries have gone in.

To illustrate, the output looks like this:

>Sample  
>Location  
>Test  
>Mean  
>Stdev  
>Status  
>Points  
>A  
>A1  
>XX1  
>0.034541  
>0.04596  
>UN  
>0  
>B  
>A1  
>XX2  
>0  
>0  
>TA  
>1

I'd like to parse it into this:


    >Sample   Location    Test      Mean     Stdev     Status   Points  
    >A        A1          XX1       0.03     0.04      UN       0  
    >B        A1          XX2       0        0         TA       1  

I wasn't entirely sure how to search for this but the closest example of what I want to do is in this example:
writing contents of list to one file in R

But I couldn't work out how to apply the answers to my data, particularly the set width options (which I didnt fully follow, but could be about column number?), capture.output, and the two list elements which I don't think applies here.

Also open to BASH or awk solutions.

Upvotes: 1

Views: 265

Answers (3)

agstudy
agstudy

Reputation: 121568

Similar to the other solution, I keep it because there is a further treatment :

matrix(gsub('>','',dat$V1),ncol=7,byrow=T)
colnames(dat) <- dat[1,]
> dat <- dat[-1,]
> dat
    Sample Location Test  Mean       Stdev     Status Points
[1,] "A"    "A1"     "XX1" "0.034541" "0.04596" "UN"   "0"   
[2,] "B"    "A1"     "XX2" "0"        "0"       "TA"   "1"   

Upvotes: 1

csgillespie
csgillespie

Reputation: 60462

Here's a complete set of steps to take you through.

  1. Read the file in as normal

    d = read.table("/tmp/tmp.txt", header=FALSE, stringsAsFactors=FALSE)
    

    This gives a data frame with one column.

  2. Next we alter the number of rows and columns:

    d = unlist(d)
    dim(d) = c(7, nrow(d)/7)
    d = t(d)
    ##Or 
    d = matrix(as.matrix(d), ncol=7, byrow=TRUE)
    
  3. Then we remove the ">" symbol:

    d = gsub(">", "", d)
    

    here we replace ">" with nothing ""

  4. Then sort out the column headings:

    colnames(d) = d[1,]
    d = d[-1,]
    

Upvotes: 4

nograpes
nograpes

Reputation: 18323

I think you want something like this:

# Read data
x<-read.table(textConnection('>Sample  
>Location  
>Test  
>Mean  
>Stdev  
>Status  
>Points  
>A  
>A1  
>XX1  
>0.034541  
>0.04596  
>UN  
>0  
>B  
>A1  
>XX2  
>0  
>0  
>TA  
>1'),stringsAsFactors=FALSE)
# Convert to matrix
mat<-matrix(as.matrix(x),ncol=7,byrow=TRUE)
# Dump as tab delimited
write.table(mat,'file.tab',sep='\t')

Upvotes: 2

Related Questions