KuanYin
KuanYin

Reputation: 141

Get numbers from a list in a file, output to another file in Ruby?

I have a big text file that contains - among others- lines like these:

"X" : "452345230"

I want to find all lines that contain "X" , and take just the number (without the quotation marks), and then output the numbers in another file, in this fashion:

452349532

234523452

213412411

219456433

etc.

What I did so far is this:

myfile = File.open("myfile.txt")
x = [] 
myfile.grep(/"X"/) {|line|
   x << line.match( /"(\d{9})/ ).values_at( 1 )[0]
   puts x
   File.open("output.txt", 'w') {|f| f.write(x) }
}

it works, but the list it produces is of this form:

["23419230", "2349345234" , ... ]

How do I output it like I showed before, just numbers and each number in a line?

Thanks.

Upvotes: 3

Views: 141

Answers (3)

pguardiario
pguardiario

Reputation: 54984

Here's a solution that doesn't leave files open:

File.open("output.txt", 'w') do |output|
    File.open("myfile.txt").each do |line|
        output.puts line[/\d{9}/] if line[/"X"/]
    end
end

Upvotes: 5

oldhomemovie
oldhomemovie

Reputation: 15129

Solution:

myfile = File.open("myfile.txt")

File.open("output.txt", 'w') do |output|
  content = myfile.lines.map { |line| line.scan(/^"X".*(\d{9})/) }.flatten.join("\n")

  output.write(content)
end

Edited: I updated the code reducing it a bit. If the example above seems complicated, you can also grab the data you want with the following statement (could be a little bit clear of what's happening):

content = myfile.lines.select { |line| line =~ /"X"/ }.map { |line| line.scan(/\d{9}/) }.join("\n")

Upvotes: 2

sarnold
sarnold

Reputation: 104050

I couldn't reproduce what you saw:

$ cat myfile.txt 
"X" : "452345230"
"X" : "452345231"
"X" : "452345232"
"X" : "452345233"
$ ./scanner.rb 
452345230
452345230
452345231
452345230
452345231
452345232
452345230
452345231
452345232
452345233
$ cat output.txt 
452345230452345231452345232452345233$ 

However, I did notice that your application is incredibly wasteful and probably not doing what you expect: You open output.txt, write some content to it, then close it again. The next time it is opened in the loop, it is overwritten. If your file is 1000 lines long, this won't be so bad, you're only making 1000 files. If your file is 1,000,000 lines long, this is going to represent a pretty horrible performance penalty as you create a file, write into it, and then delete it again, one million times. Oops.

I re-wrote your tool a little bit:

$ cat scanner.rb 
#!/usr/bin/ruby -w

myfile = File.open("myfile.txt")
output = File.open("output.txt", 'w')
myfile.grep(/"X"/) {|line|
   x = line.match( /"(\d{9})/ ).values_at( 1 )[0]
   puts x
   output.write(x + "\n")
}

This opens each file exactly onces, writes each new line one at a time, and then lets them both be closed when the application quits. Depending upon if this is a small portion of your application or the entire thing, this might be alright. (If this is a small portion of the program, then definitely close the files when you're done with them.)

This might still be wasteful for one million matched lines -- those writes are almost certainly handed straight to the system call write(2), which will involve some overhead.

How many of these will you be running? Millions? Billions? If this needs more refinement feel free to ask...

Upvotes: 2

Related Questions