baz
baz

Reputation: 7117

Reading and constructing a table from a data frame with multiple columns using R

I would like to read this output file (gibbs_samples) into R and creat a table of between the number of iterations (1- 44) and variances. The final table should be have 10 columns, with each column:

"iter","va1","cova12","va2","vc1","covc12","vc2","ev1","cove12","ev2"

to be separated by an empty space:

1 0.2319E-01 0.1166E-02 0.9017E-02 0.3545E-01 0.1543E-01 0.1860 0.5173E-01 0.1542E-01 0.2297
2
.
.
44 

Note:

i) The first 10 lines should be disregarded

ii) The number 9 in the second column to be removed and every 3 lines should fall in the same row,

 example:

          1       9
 0.2319E-01 0.1166E-02 0.9017E-02 0.3545E-01 0.1543E-01 0.1860     0.5173E-01
 0.1542E-01 0.2297    

should form the first row:

1 0.2319E-01 0.1166E-02 0.9017E-02 0.3545E-01 0.1543E-01 0.1860 0.5173E-01 0.1542E-01 0.2297

iii) There is an empty space between the left margin and the numbers

iv) Some columns (mostly cols 1 and 2) have no empty space between them

example:

      21       9
 0.2331E-01-0.1479E-03 0.7441E-02 0.2520E-01 0.1537E-01 0.5753E-01 0.7325E-01
 0.2136E-01 0.1439    
      22       9
 0.2377E-01-0.2597E-03 0.7385E-02 0.2614E-01 0.1565E-01 0.6142E-01 0.7073E-01
 0.1946E-01 0.1424  

Below is the output file "gibbs_samples":

          -1       9       4
        1    6    6    1    1
        2    6    6    1    2
        3    6    6    2    2
        4    7    7    1    1
        5    7    7    1    2
        6    7    7    2    2
        7    0    0    1    1
        8    0    0    1    2
        9    0    0    2    2
       1       9
 0.2319E-01 0.1166E-02 0.9017E-02 0.3545E-01 0.1543E-01 0.1860     0.5173E-01
 0.1542E-01 0.2297    
       2       9
 0.2315E-01 0.1457E-02 0.8491E-02 0.3538E-01 0.1838E-01 0.9397E-01 0.6191E-01
 0.1684E-01 0.1538    
       3       9
 0.2311E-01 0.1363E-02 0.8228E-02 0.3032E-01 0.1593E-01 0.7850E-01 0.6831E-01
 0.1998E-01 0.1450    
       4       9
 0.2282E-01 0.1120E-02 0.7982E-02 0.2935E-01 0.1425E-01 0.7046E-01 0.6993E-01
 0.1987E-01 0.1411    
       5       9
 0.2263E-01 0.1138E-02 0.7893E-02 0.2935E-01 0.1524E-01 0.6388E-01 0.7037E-01
 0.1909E-01 0.1439    
       6       9
 0.2202E-01 0.1260E-02 0.7649E-02 0.3002E-01 0.1776E-01 0.6507E-01 0.7196E-01
 0.2067E-01 0.1429    
       7       9
 0.2229E-01 0.9052E-03 0.7424E-02 0.3015E-01 0.1945E-01 0.6771E-01 0.7075E-01
 0.2017E-01 0.1438    
       8       9
 0.2163E-01 0.7266E-03 0.7327E-02 0.3211E-01 0.2199E-01 0.6600E-01 0.7120E-01
 0.1876E-01 0.1458    
       9       9
 0.2134E-01 0.6320E-03 0.7375E-02 0.3316E-01 0.1930E-01 0.6214E-01 0.7083E-01
 0.2038E-01 0.1465    
      10       9
 0.2188E-01 0.8527E-03 0.7504E-02 0.2835E-01 0.1504E-01 0.6133E-01 0.7096E-01
 0.1839E-01 0.1458    
      11       9
 0.2111E-01 0.9058E-03 0.7598E-02 0.2629E-01 0.1543E-01 0.6452E-01 0.7105E-01
 0.2034E-01 0.1450    
      12       9
 0.2129E-01 0.7306E-03 0.7787E-02 0.2844E-01 0.1472E-01 0.6338E-01 0.7284E-01
 0.1831E-01 0.1452    
      13       9
 0.2163E-01 0.7417E-03 0.7484E-02 0.2881E-01 0.1532E-01 0.6110E-01 0.7104E-01
 0.1979E-01 0.1454    
      14       9
 0.2200E-01 0.4375E-03 0.7549E-02 0.3004E-01 0.1674E-01 0.6364E-01 0.7125E-01
 0.1999E-01 0.1432    
      15       9
 0.2189E-01 0.2382E-03 0.7774E-02 0.2962E-01 0.1681E-01 0.6627E-01 0.7203E-01
 0.1894E-01 0.1419    
      16       9
 0.2155E-01 0.2874E-03 0.7529E-02 0.3123E-01 0.1612E-01 0.6113E-01 0.7190E-01
 0.1877E-01 0.1485    
      17       9
 0.2167E-01 0.2198E-03 0.7418E-02 0.3035E-01 0.1574E-01 0.6015E-01 0.7060E-01
 0.1976E-01 0.1484    
      18       9
 0.2257E-01 0.4865E-04 0.7392E-02 0.3082E-01 0.1619E-01 0.5738E-01 0.7020E-01
 0.1838E-01 0.1443    
          19       9
 0.2184E-01 0.1929E-03 0.7315E-02 0.3363E-01 0.1854E-01 0.6592E-01 0.7136E-01
 0.1890E-01 0.1466    
      20       9
 0.2214E-01 0.7481E-04 0.7492E-02 0.2906E-01 0.1682E-01 0.6001E-01 0.7087E-01
 0.2037E-01 0.1469    
      21       9
 0.2331E-01-0.1479E-03 0.7441E-02 0.2520E-01 0.1537E-01 0.5753E-01 0.7325E-01
 0.2136E-01 0.1439    
      22       9
 0.2377E-01-0.2597E-03 0.7385E-02 0.2614E-01 0.1565E-01 0.6142E-01 0.7073E-01
 0.1946E-01 0.1424    
      23       9
 0.2366E-01-0.1304E-03 0.7536E-02 0.2996E-01 0.1942E-01 0.5751E-01 0.7112E-01
 0.2063E-01 0.1442    
      24       9
 0.2353E-01-0.1806E-03 0.7412E-02 0.3136E-01 0.2238E-01 0.6733E-01 0.7275E-01
 0.1907E-01 0.1425    
      25       9
 0.2278E-01-0.3747E-03 0.7351E-02 0.3003E-01 0.1832E-01 0.6088E-01 0.7126E-01
 0.2140E-01 0.1469    
      26       9
 0.2259E-01-0.3012E-03 0.7219E-02 0.2732E-01 0.1631E-01 0.5692E-01 0.6851E-01
 0.1875E-01 0.1447    
      27       9
 0.2231E-01-0.2277E-03 0.7038E-02 0.2828E-01 0.1892E-01 0.5589E-01 0.6876E-01
 0.1963E-01 0.1413    
      28       9
 0.2229E-01-0.2523E-03 0.6896E-02 0.3183E-01 0.1796E-01 0.5565E-01 0.6928E-01
 0.1921E-01 0.1425    
      29       9
 0.2152E-01-0.3977E-03 0.7060E-02 0.3026E-01 0.1822E-01 0.5779E-01 0.7268E-01
 0.1956E-01 0.1445    
      30       9
 0.2109E-01-0.2360E-03 0.6998E-02 0.3183E-01 0.1643E-01 0.5578E-01 0.7322E-01
 0.2124E-01 0.1492    
      31       9
 0.2135E-01-0.2020E-04 0.6787E-02 0.2956E-01 0.1830E-01 0.5778E-01 0.7114E-01
 0.2004E-01 0.1447    
      32       9
 0.2154E-01 0.1254E-03 0.6829E-02 0.3317E-01 0.2162E-01 0.6202E-01 0.7061E-01
 0.1795E-01 0.1412    
      33       9
 0.2199E-01-0.6386E-04 0.6709E-02 0.2947E-01 0.1854E-01 0.5679E-01 0.7039E-01
 0.1803E-01 0.1458    
      34       9
 0.2184E-01 0.8950E-04 0.6689E-02 0.3075E-01 0.1846E-01 0.5174E-01 0.7162E-01
 0.1975E-01 0.1443    
      35       9
 0.2168E-01 0.4526E-04 0.6550E-02 0.3146E-01 0.1977E-01 0.5322E-01 0.7290E-01
 0.1834E-01 0.1470    
      36       9
 0.2153E-01 0.2621E-03 0.6705E-02 0.2954E-01 0.1959E-01 0.5310E-01 0.7244E-01
 0.1869E-01 0.1441    
      37       9
 0.2234E-01 0.2712E-03 0.6856E-02 0.3043E-01 0.2318E-01 0.6494E-01 0.7143E-01
 0.1889E-01 0.1426    
      38       9
 0.2217E-01 0.2001E-03 0.7079E-02 0.2945E-01 0.2101E-01 0.6468E-01 0.7113E-01
 0.1815E-01 0.1429    
      39       9
 0.2235E-01 0.2269E-03 0.7234E-02 0.2940E-01 0.1930E-01 0.6252E-01 0.7006E-01
 0.2083E-01 0.1450    
      40       9
 0.2280E-01 0.2957E-04 0.7126E-02 0.2971E-01 0.1971E-01 0.5912E-01 0.7027E-01
 0.1819E-01 0.1493    
      41       9
 0.2296E-01 0.1358E-03 0.6955E-02 0.2877E-01 0.1901E-01 0.5768E-01 0.6981E-01
 0.1956E-01 0.1449    
      42       9
 0.2302E-01-0.2015E-03 0.7196E-02 0.3145E-01 0.1959E-01 0.5772E-01 0.7115E-01
 0.2065E-01 0.1445    
      43       9
 0.2277E-01-0.2184E-03 0.6900E-02 0.3119E-01 0.2150E-01 0.5739E-01 0.6990E-01
 0.2023E-01 0.1435    
      44       9
 0.2300E-01-0.1301E-03 0.6936E-02 0.2865E-01 0.1844E-01 0.5872E-01 0.6876E-01
 0.1887E-01 0.1478    

Your help would be highly appreciated!

Upvotes: 1

Views: 238

Answers (2)

Andrie
Andrie

Reputation: 179418

In base R there is the function read.fwf that reads fixed width files. It has the capability to read multi-line file formats, by passing a list to the argument widths.

Here is a starting point for your data, (assuming your data file is called "raw"):

dat <- read.fwf(raw, skip=11, 
    widths=list(
        c(8, 8), 
        c(11, rep(11, 6)),
        c(11, 11)
        ),
    stringsAsFactors=FALSE,
    colClasses="character"
)

Convert columns to numeric. (There seems to be something odd with the column alignment of row 19 of your data. Usually the following step would not be necessary, because numeric conversion happens automatically.)

dat <- as.data.frame(lapply(dat, as.numeric))

The results:

str(dat)

'data.frame':   44 obs. of  11 variables:
 $ V1 : num  1 2 3 4 5 6 7 8 9 10 ...
 $ V2 : num  9 9 9 9 9 9 9 9 9 9 ...
 $ V3 : num  0.0232 0.0232 0.0231 0.0228 0.0226 ...
 $ V4 : num  0.00117 0.00146 0.00136 0.00112 0.00114 ...
 $ V5 : num  0.00902 0.00849 0.00823 0.00798 0.00789 ...
 $ V6 : num  0.0355 0.0354 0.0303 0.0294 0.0294 ...
 $ V7 : num  0.0154 0.0184 0.0159 0.0142 0.0152 ...
 $ V8 : num  0.186 0.094 0.0785 0.0705 0.0639 ...
 $ V9 : num  0.0517 0.0619 0.0683 0.0699 0.0704 ...
 $ V10: num  0.0154 0.0168 0.02 0.0199 0.0191 ...
 $ V11: num  0.23 0.154 0.145 0.141 0.144 ...

The only remaining task is to remove the second column, which is trivial.

Upvotes: 2

Alex
Alex

Reputation: 2011

R is not very handy for performing data manipulation. You should do the manipulation of this kind in another way prior to loading it into R for statistical analysis. You could use an ETL-tool (Extraction, Transformation and Loading) for this job. I'm currently only familiar with one ETL tool (Integration Services which comes with SQL Server), but I'm sure others are available.

Alternatively you can write a script that does the manipulation for you.

This of cause depends on your technical options and skills... :)

Good luck.

Upvotes: -1

Related Questions