15 Volts
15 Volts

Reputation: 2077

How do I read the nth line of a file efficiently in Ruby?

I have a 2 GiB file, and I want to read the first line of the file. I can call the File#readlines method which returns array, and use [0] bracket syntax, at(0), or the slice(0) or first methods.

But there's a problem. My PC has 3.7 GiB RAM, and the usage goes from 1.1 GiB all the way up to 3.7 GiB. But all I want is the first line of the file. Is there an efficient way to do that?

Upvotes: 0

Views: 632

Answers (5)

TorvaldsDB
TorvaldsDB

Reputation: 972

get from https://www.rosettacode.org/wiki/Read_a_specific_line_from_a_file#Ruby

 seventh_line = open("/etc/passwd").each_line.take(7).last

Upvotes: 1

15 Volts
15 Volts

Reputation: 2077

So I have came with a code that does the job quite efficiently.

Firstly, we can use the IO#each_line method. Say we need the line at 3,000,000:

#!/usr/bin/ruby -w

file = File.open(File.join(__dir__, 'hello.txt'))
final = nil
read_upto = 3_000_000 - 1

file.each_line.with_index do |l, i|
    if i == read_upto
        final = l
        break
    end
end

file.close
p final

Running with the time shell builtin:

[I have a big hello.txt file with #!/usr/bin/ruby -w #lineno in it!!]

$ time ruby p.rb
"#!/usr/bin/ruby -w #3000000\n"

real    0m1.298s
user    0m1.240s
sys 0m0.043s

We can also get the 1st line very easily! You got it...

Secondly, extending anothermh's answer:

#!/usr/bin/ruby -w

enum = IO.foreach(File.join(__dir__, 'hello.txt'))

# Getting the first line
p enum.first

# Getting the 100th line
# This can still cause memory issues because it
# creates an array out of each line
p enum.take(100)[-1]

# The time consuming but memory efficient way
# reading the 3,000,000th line
# While loops are fastest

index, i = 3_000_000 - 1, 0
enum.next && i += 1 while i < index
p enum.next    # reading the 3,000,000th line

Running with time:

time ruby p.rb 
"#!/usr/bin/ruby -w #1\n"
"#!/usr/bin/ruby -w #100\n"
"#!/usr/bin/ruby -w #3000000\n"

real    0m2.341s
user    0m2.274s
sys 0m0.050s

There could be other ways like the IO#readpartial, IO#sysread and so on. But The IO.foreach, and IO#each_line are the easiest and quite fast to work with.

Hope this helps!

Upvotes: 0

Sandra
Sandra

Reputation: 406

I would use commands line. For example, in this way:

exec("cat #{filename} | head -#{nth_line} | tail -1")

I hope it useful for you.

Upvotes: 0

anothermh
anothermh

Reputation: 10536

What about IO.foreach?

IO.foreach('filename') { |line| p line; break }

That should read the first line, print it, and then stop. It does not read the entire file; it reads one line at a time.

Upvotes: 0

Yoav Epstein
Yoav Epstein

Reputation: 859

have you tried readline instead of readlines?

File.open('file-name') { |f| f.readline }

Upvotes: 0

Related Questions