Reputation: 141
I have a big text file that contains - among others- lines like these:
"X" : "452345230"
I want to find all lines that contain "X" , and take just the number (without the quotation marks), and then output the numbers in another file, in this fashion:
452349532
234523452
213412411
219456433
etc.
What I did so far is this:
myfile = File.open("myfile.txt")
x = []
myfile.grep(/"X"/) {|line|
x << line.match( /"(\d{9})/ ).values_at( 1 )[0]
puts x
File.open("output.txt", 'w') {|f| f.write(x) }
}
it works, but the list it produces is of this form:
["23419230", "2349345234" , ... ]
How do I output it like I showed before, just numbers and each number in a line?
Thanks.
Upvotes: 3
Views: 141
Reputation: 54984
Here's a solution that doesn't leave files open:
File.open("output.txt", 'w') do |output|
File.open("myfile.txt").each do |line|
output.puts line[/\d{9}/] if line[/"X"/]
end
end
Upvotes: 5
Reputation: 15129
Solution:
myfile = File.open("myfile.txt")
File.open("output.txt", 'w') do |output|
content = myfile.lines.map { |line| line.scan(/^"X".*(\d{9})/) }.flatten.join("\n")
output.write(content)
end
Edited: I updated the code reducing it a bit. If the example above seems complicated, you can also grab the data you want with the following statement (could be a little bit clear of what's happening):
content = myfile.lines.select { |line| line =~ /"X"/ }.map { |line| line.scan(/\d{9}/) }.join("\n")
Upvotes: 2
Reputation: 104050
I couldn't reproduce what you saw:
$ cat myfile.txt
"X" : "452345230"
"X" : "452345231"
"X" : "452345232"
"X" : "452345233"
$ ./scanner.rb
452345230
452345230
452345231
452345230
452345231
452345232
452345230
452345231
452345232
452345233
$ cat output.txt
452345230452345231452345232452345233$
However, I did notice that your application is incredibly wasteful and probably not doing what you expect: You open output.txt
, write some content to it, then close it again. The next time it is opened in the loop, it is overwritten. If your file is 1000 lines long, this won't be so bad, you're only making 1000 files. If your file is 1,000,000 lines long, this is going to represent a pretty horrible performance penalty as you create a file, write into it, and then delete it again, one million times. Oops.
I re-wrote your tool a little bit:
$ cat scanner.rb
#!/usr/bin/ruby -w
myfile = File.open("myfile.txt")
output = File.open("output.txt", 'w')
myfile.grep(/"X"/) {|line|
x = line.match( /"(\d{9})/ ).values_at( 1 )[0]
puts x
output.write(x + "\n")
}
This opens each file exactly onces, writes each new line one at a time, and then lets them both be closed when the application quits. Depending upon if this is a small portion of your application or the entire thing, this might be alright. (If this is a small portion of the program, then definitely close the files when you're done with them.)
This might still be wasteful for one million matched lines -- those writes are almost certainly handed straight to the system call write(2)
, which will involve some overhead.
How many of these will you be running? Millions? Billions? If this needs more refinement feel free to ask...
Upvotes: 2