Reputation: 6555
I have this dataset:
LP3I22- M5
01174c-qbFD.raw
L2P2 + p LPI Full ms [150.00-1500.00]
Scan #: 1
RT: 6.11
m/z Intensity Relative Resolution Charge Baseline
150.0119 67.3 0.00 152545.44 0.00 26.27
150.0153 59.3 0.00 269991.72 0.00 26.28
150.0156 66.1 0.00 288504.16 0.00 26.28
150.0161 67.2 0.00 172425.14 0.00 26.28
150.0330 78.9 0.00 167957.34 0.00 26.32
150.0485 75.0 0.00 208783.14 0.00 26.35
150.0603 166.2 0.00 220081.53 0.00 26.37
150.0624 75.8 0.00 189976.39 0.00 26.38
150.0866 70.1 0.00 233127.77 0.00 26.42
150.0991 54.8 0.00 193755.25 0.00 26.45
150.1136 62.9 0.00 184047.91 0.00 26.48
150.1348 85.4 0.00 206299.06 0.00 26.52
150.1410 68.7 0.00 225439.47 0.00 26.53
150.1428 73.1 0.00 205324.42 0.00 26.54
150.1498 61.2 0.00 199792.59 0.00 26.55
150.1572 56.8 0.00 160342.95 0.00 26.57
150.1583 71.4 0.00 187849.53 0.00 26.57
150.1746 84.7 0.00 211934.81 0.00 26.60
150.1777 81.2 0.00 251123.45 0.00 26.61
150.2106 65.7 0.00 198830.13 0.00 26.67
150.2144 53.7 0.00 190111.53 0.00 26.68
150.2781 74.0 0.00 187803.52 0.00 26.81
150.2807 90.7 0.00 174743.38 0.00 26.82
How can I extract the data results using regex? I'm not interested in the the first 7 lines.
Upvotes: 1
Views: 322
Reputation: 45057
lines = IO.readlines('inputfile.txt')
data = lines[7..-1].collect{|x| x.scan(/([^\d]+[\d.]+)/).flatten.map{|y| y.strip}}
For a simpler solution that doesn't involve a regex, replace the last line with:
data = lines[7..-1].collect{|x| x.split}
This all assumes that the data set matches the one you listed and does not contain any unexpected or improperly-formatted values.
Upvotes: 3
Reputation: 80075
7.times{DATA.readline} # discard first 7 lines
res = DATA.map{ |line| line.lstrip.squeeze.split(' ').map{|el| el.to_f } }
__END__
LP3I22- M5
01174c-qbFD.raw
L2P2 + p LPI Full ms [150.00-1500.00]
Scan #: 1
RT: 6.11
m/z Intensity Relative Resolution Charge Baseline
150.0119 67.3 0.00 152545.44 0.00 26.27
150.0153 59.3 0.00 269991.72 0.00 26.28
150.0156 66.1 0.00 288504.16 0.00 26.28
150.0161 67.2 0.00 172425.14 0.00 26.28
150.0330 78.9 0.00 167957.34 0.00 26.32
150.0485 75.0 0.00 208783.14 0.00 26.35
150.0603 166.2 0.00 220081.53 0.00 26.37
The values in res are now floats:
[[150.019, 67.3, 0.0, 152545.4, 0.0, 26.27], [150.0153, 59.3, 0.0, 2691.72, 0.0, 26.28],
[150.0156, 6.1, 0.0, 28504.16, 0.0, 26.28], [150.0161, 67.2, 0.0, 172425.14, 0.0, 26.28],
[150.03, 78.9, 0.0, 167957.34, 0.0, 26.32], [150.0485, 75.0, 0.0, 208783.14, 0.0, 26.35],
[150.0603, 16.2, 0.0, 2081.53, 0.0, 26.37]
Upvotes: 1
Reputation: 41189
Assuming it's in a String called data
number_re = /\s*(\d+\.\d+)\s*/
data.scan(/^#{number_re.source * 6}$/)
That will result in the following array
[["150.0119", "67.3", "0.00", "152545.44", "0.00", "26.27"],
["150.0153", "59.3", "0.00", "269991.72", "0.00", "26.28"],
["150.0156", "66.1", "0.00", "288504.16", "0.00", "26.28"],
["150.0161", "67.2", "0.00", "172425.14", "0.00", "26.28"],
["150.0330", "78.9", "0.00", "167957.34", "0.00", "26.32"],
["150.0485", "75.0", "0.00", "208783.14", "0.00", "26.35"],
["150.0603", "166.2", "0.00", "220081.53", "0.00", "26.37"],
["150.0624", "75.8", "0.00", "189976.39", "0.00", "26.38"],
["150.0866", "70.1", "0.00", "233127.77", "0.00", "26.42"],
["150.0991", "54.8", "0.00", "193755.25", "0.00", "26.45"],
["150.1136", "62.9", "0.00", "184047.91", "0.00", "26.48"],
["150.1348", "85.4", "0.00", "206299.06", "0.00", "26.52"],
["150.1410", "68.7", "0.00", "225439.47", "0.00", "26.53"],
["150.1428", "73.1", "0.00", "205324.42", "0.00", "26.54"],
["150.1498", "61.2", "0.00", "199792.59", "0.00", "26.55"],
["150.1572", "56.8", "0.00", "160342.95", "0.00", "26.57"],
["150.1583", "71.4", "0.00", "187849.53", "0.00", "26.57"],
["150.1746", "84.7", "0.00", "211934.81", "0.00", "26.60"],
["150.1777", "81.2", "0.00", "251123.45", "0.00", "26.61"],
["150.2106", "65.7", "0.00", "198830.13", "0.00", "26.67"],
["150.2144", "53.7", "0.00", "190111.53", "0.00", "26.68"],
["150.2781", "74.0", "0.00", "187803.52", "0.00", "26.81"],
["150.2807", "90.7", "0.00", "174743.38", "0.00", "26.82"]]
Upvotes: 6
Reputation: 40609
Use pattern:
^\s*(\d+\.\d+)\s*(\d+\.\d+)\s*(\d+\.\d+)\s*(\d+\.\d+)\s*(\d+\.\d+)\s*(\d+\.\d+)\s*$
in multiline mode
Upvotes: 1