Reputation: 7117
I would like to read this output file (gibbs_samples) into R and creat a table of between the number of iterations (1- 44) and variances. The final table should be have 10 columns, with each column:
"iter","va1","cova12","va2","vc1","covc12","vc2","ev1","cove12","ev2"
to be separated by an empty space:
1 0.2319E-01 0.1166E-02 0.9017E-02 0.3545E-01 0.1543E-01 0.1860 0.5173E-01 0.1542E-01 0.2297
2
.
.
44
Note:
i) The first 10 lines should be disregarded
ii) The number 9 in the second column to be removed and every 3 lines should fall in the same row,
example:
1 9
0.2319E-01 0.1166E-02 0.9017E-02 0.3545E-01 0.1543E-01 0.1860 0.5173E-01
0.1542E-01 0.2297
should form the first row:
1 0.2319E-01 0.1166E-02 0.9017E-02 0.3545E-01 0.1543E-01 0.1860 0.5173E-01 0.1542E-01 0.2297
iii) There is an empty space between the left margin and the numbers
iv) Some columns (mostly cols 1 and 2) have no empty space between them
example:
21 9
0.2331E-01-0.1479E-03 0.7441E-02 0.2520E-01 0.1537E-01 0.5753E-01 0.7325E-01
0.2136E-01 0.1439
22 9
0.2377E-01-0.2597E-03 0.7385E-02 0.2614E-01 0.1565E-01 0.6142E-01 0.7073E-01
0.1946E-01 0.1424
Below is the output file "gibbs_samples":
-1 9 4
1 6 6 1 1
2 6 6 1 2
3 6 6 2 2
4 7 7 1 1
5 7 7 1 2
6 7 7 2 2
7 0 0 1 1
8 0 0 1 2
9 0 0 2 2
1 9
0.2319E-01 0.1166E-02 0.9017E-02 0.3545E-01 0.1543E-01 0.1860 0.5173E-01
0.1542E-01 0.2297
2 9
0.2315E-01 0.1457E-02 0.8491E-02 0.3538E-01 0.1838E-01 0.9397E-01 0.6191E-01
0.1684E-01 0.1538
3 9
0.2311E-01 0.1363E-02 0.8228E-02 0.3032E-01 0.1593E-01 0.7850E-01 0.6831E-01
0.1998E-01 0.1450
4 9
0.2282E-01 0.1120E-02 0.7982E-02 0.2935E-01 0.1425E-01 0.7046E-01 0.6993E-01
0.1987E-01 0.1411
5 9
0.2263E-01 0.1138E-02 0.7893E-02 0.2935E-01 0.1524E-01 0.6388E-01 0.7037E-01
0.1909E-01 0.1439
6 9
0.2202E-01 0.1260E-02 0.7649E-02 0.3002E-01 0.1776E-01 0.6507E-01 0.7196E-01
0.2067E-01 0.1429
7 9
0.2229E-01 0.9052E-03 0.7424E-02 0.3015E-01 0.1945E-01 0.6771E-01 0.7075E-01
0.2017E-01 0.1438
8 9
0.2163E-01 0.7266E-03 0.7327E-02 0.3211E-01 0.2199E-01 0.6600E-01 0.7120E-01
0.1876E-01 0.1458
9 9
0.2134E-01 0.6320E-03 0.7375E-02 0.3316E-01 0.1930E-01 0.6214E-01 0.7083E-01
0.2038E-01 0.1465
10 9
0.2188E-01 0.8527E-03 0.7504E-02 0.2835E-01 0.1504E-01 0.6133E-01 0.7096E-01
0.1839E-01 0.1458
11 9
0.2111E-01 0.9058E-03 0.7598E-02 0.2629E-01 0.1543E-01 0.6452E-01 0.7105E-01
0.2034E-01 0.1450
12 9
0.2129E-01 0.7306E-03 0.7787E-02 0.2844E-01 0.1472E-01 0.6338E-01 0.7284E-01
0.1831E-01 0.1452
13 9
0.2163E-01 0.7417E-03 0.7484E-02 0.2881E-01 0.1532E-01 0.6110E-01 0.7104E-01
0.1979E-01 0.1454
14 9
0.2200E-01 0.4375E-03 0.7549E-02 0.3004E-01 0.1674E-01 0.6364E-01 0.7125E-01
0.1999E-01 0.1432
15 9
0.2189E-01 0.2382E-03 0.7774E-02 0.2962E-01 0.1681E-01 0.6627E-01 0.7203E-01
0.1894E-01 0.1419
16 9
0.2155E-01 0.2874E-03 0.7529E-02 0.3123E-01 0.1612E-01 0.6113E-01 0.7190E-01
0.1877E-01 0.1485
17 9
0.2167E-01 0.2198E-03 0.7418E-02 0.3035E-01 0.1574E-01 0.6015E-01 0.7060E-01
0.1976E-01 0.1484
18 9
0.2257E-01 0.4865E-04 0.7392E-02 0.3082E-01 0.1619E-01 0.5738E-01 0.7020E-01
0.1838E-01 0.1443
19 9
0.2184E-01 0.1929E-03 0.7315E-02 0.3363E-01 0.1854E-01 0.6592E-01 0.7136E-01
0.1890E-01 0.1466
20 9
0.2214E-01 0.7481E-04 0.7492E-02 0.2906E-01 0.1682E-01 0.6001E-01 0.7087E-01
0.2037E-01 0.1469
21 9
0.2331E-01-0.1479E-03 0.7441E-02 0.2520E-01 0.1537E-01 0.5753E-01 0.7325E-01
0.2136E-01 0.1439
22 9
0.2377E-01-0.2597E-03 0.7385E-02 0.2614E-01 0.1565E-01 0.6142E-01 0.7073E-01
0.1946E-01 0.1424
23 9
0.2366E-01-0.1304E-03 0.7536E-02 0.2996E-01 0.1942E-01 0.5751E-01 0.7112E-01
0.2063E-01 0.1442
24 9
0.2353E-01-0.1806E-03 0.7412E-02 0.3136E-01 0.2238E-01 0.6733E-01 0.7275E-01
0.1907E-01 0.1425
25 9
0.2278E-01-0.3747E-03 0.7351E-02 0.3003E-01 0.1832E-01 0.6088E-01 0.7126E-01
0.2140E-01 0.1469
26 9
0.2259E-01-0.3012E-03 0.7219E-02 0.2732E-01 0.1631E-01 0.5692E-01 0.6851E-01
0.1875E-01 0.1447
27 9
0.2231E-01-0.2277E-03 0.7038E-02 0.2828E-01 0.1892E-01 0.5589E-01 0.6876E-01
0.1963E-01 0.1413
28 9
0.2229E-01-0.2523E-03 0.6896E-02 0.3183E-01 0.1796E-01 0.5565E-01 0.6928E-01
0.1921E-01 0.1425
29 9
0.2152E-01-0.3977E-03 0.7060E-02 0.3026E-01 0.1822E-01 0.5779E-01 0.7268E-01
0.1956E-01 0.1445
30 9
0.2109E-01-0.2360E-03 0.6998E-02 0.3183E-01 0.1643E-01 0.5578E-01 0.7322E-01
0.2124E-01 0.1492
31 9
0.2135E-01-0.2020E-04 0.6787E-02 0.2956E-01 0.1830E-01 0.5778E-01 0.7114E-01
0.2004E-01 0.1447
32 9
0.2154E-01 0.1254E-03 0.6829E-02 0.3317E-01 0.2162E-01 0.6202E-01 0.7061E-01
0.1795E-01 0.1412
33 9
0.2199E-01-0.6386E-04 0.6709E-02 0.2947E-01 0.1854E-01 0.5679E-01 0.7039E-01
0.1803E-01 0.1458
34 9
0.2184E-01 0.8950E-04 0.6689E-02 0.3075E-01 0.1846E-01 0.5174E-01 0.7162E-01
0.1975E-01 0.1443
35 9
0.2168E-01 0.4526E-04 0.6550E-02 0.3146E-01 0.1977E-01 0.5322E-01 0.7290E-01
0.1834E-01 0.1470
36 9
0.2153E-01 0.2621E-03 0.6705E-02 0.2954E-01 0.1959E-01 0.5310E-01 0.7244E-01
0.1869E-01 0.1441
37 9
0.2234E-01 0.2712E-03 0.6856E-02 0.3043E-01 0.2318E-01 0.6494E-01 0.7143E-01
0.1889E-01 0.1426
38 9
0.2217E-01 0.2001E-03 0.7079E-02 0.2945E-01 0.2101E-01 0.6468E-01 0.7113E-01
0.1815E-01 0.1429
39 9
0.2235E-01 0.2269E-03 0.7234E-02 0.2940E-01 0.1930E-01 0.6252E-01 0.7006E-01
0.2083E-01 0.1450
40 9
0.2280E-01 0.2957E-04 0.7126E-02 0.2971E-01 0.1971E-01 0.5912E-01 0.7027E-01
0.1819E-01 0.1493
41 9
0.2296E-01 0.1358E-03 0.6955E-02 0.2877E-01 0.1901E-01 0.5768E-01 0.6981E-01
0.1956E-01 0.1449
42 9
0.2302E-01-0.2015E-03 0.7196E-02 0.3145E-01 0.1959E-01 0.5772E-01 0.7115E-01
0.2065E-01 0.1445
43 9
0.2277E-01-0.2184E-03 0.6900E-02 0.3119E-01 0.2150E-01 0.5739E-01 0.6990E-01
0.2023E-01 0.1435
44 9
0.2300E-01-0.1301E-03 0.6936E-02 0.2865E-01 0.1844E-01 0.5872E-01 0.6876E-01
0.1887E-01 0.1478
Your help would be highly appreciated!
Upvotes: 1
Views: 238
Reputation: 179418
In base R there is the function read.fwf
that reads fixed width files. It has the capability to read multi-line file formats, by passing a list
to the argument widths
.
Here is a starting point for your data, (assuming your data file is called "raw"):
dat <- read.fwf(raw, skip=11,
widths=list(
c(8, 8),
c(11, rep(11, 6)),
c(11, 11)
),
stringsAsFactors=FALSE,
colClasses="character"
)
Convert columns to numeric. (There seems to be something odd with the column alignment of row 19 of your data. Usually the following step would not be necessary, because numeric conversion happens automatically.)
dat <- as.data.frame(lapply(dat, as.numeric))
The results:
str(dat)
'data.frame': 44 obs. of 11 variables:
$ V1 : num 1 2 3 4 5 6 7 8 9 10 ...
$ V2 : num 9 9 9 9 9 9 9 9 9 9 ...
$ V3 : num 0.0232 0.0232 0.0231 0.0228 0.0226 ...
$ V4 : num 0.00117 0.00146 0.00136 0.00112 0.00114 ...
$ V5 : num 0.00902 0.00849 0.00823 0.00798 0.00789 ...
$ V6 : num 0.0355 0.0354 0.0303 0.0294 0.0294 ...
$ V7 : num 0.0154 0.0184 0.0159 0.0142 0.0152 ...
$ V8 : num 0.186 0.094 0.0785 0.0705 0.0639 ...
$ V9 : num 0.0517 0.0619 0.0683 0.0699 0.0704 ...
$ V10: num 0.0154 0.0168 0.02 0.0199 0.0191 ...
$ V11: num 0.23 0.154 0.145 0.141 0.144 ...
The only remaining task is to remove the second column, which is trivial.
Upvotes: 2
Reputation: 2011
R is not very handy for performing data manipulation. You should do the manipulation of this kind in another way prior to loading it into R for statistical analysis. You could use an ETL-tool (Extraction, Transformation and Loading) for this job. I'm currently only familiar with one ETL tool (Integration Services which comes with SQL Server), but I'm sure others are available.
Alternatively you can write a script that does the manipulation for you.
This of cause depends on your technical options and skills... :)
Good luck.
Upvotes: -1