Matt Bettinson
Matt Bettinson

Reputation: 531

File only not showing it is empty after size method is called

I have a very strange bug in Ruby/Rails. I'm using byebug to ask what the file contains, which shows nothing. Then I'm asking how big the file is, and then when I read it again it contains something. I am on Mac OS. Here is the output:

(byebug) File.read(@file)
""
(byebug) @file.size
23
(byebug) File.read(@file)
"HTML + CRT goes here..."

Anyone have any ideas?

I've reproduced it into the following Ruby code:

file = File.new('output.html', 'w')
file.write("Hey")
new = file
puts File.read(new)
puts new.size
puts File.read(new)

Upvotes: 0

Views: 66

Answers (2)

thesecretmaster
thesecretmaster

Reputation: 1984

It's because you've only opened the file for writing, not reading. When you use

file = File.new('output.html', 'w')

you open "output.html" for writing, and then, as expected you write "Hey" to it using

file.write("Hey")

If you were to open the file now, it would contain "Hey".

Here is where it gets weird. Setting new = file is not needed, because they are references to the same object. When you use File.read(new) you are reading from something that is not readable. You would see this if you ran new.read, because you would get an error. When you run new.size it opens it for reading, possibly to read it to get the size.

duping it and cloneing it do the same thing. Once it has been opened for reading, you can read from it correctly.

What I do not know is why dup or clone opens it for reading or why when you try to File.read a non-readable file it returns "" whereas when you read the file it throws an error.

Here is an ever stranger snippet that reproduced the behavior you created:

output = File.new('output.html', 'w')
output.write("Hey")
puts File.read(output) #=> ""
new = output.dup
puts File.read(output) #=> "Hey"

Upvotes: 1

the Tin Man
the Tin Man

Reputation: 160571

I think you're running afoul of how files are written and flushed to disk.

Writing to a file doesn't guarantee it's immediately written unless the sync flag is set.

When sync mode is true, all output is immediately flushed to the underlying operating system and is not buffered internally.

See sync and sync= for more information.

If sync is false, the file will be written when the intermediate buffer is full, or if something triggers the flush, such as closing the file. It's been a while since I checked, but it might also occur when a new-line is written.

Don't think that setting sync = true is the proper fix. There's a really important reason we use an intermediate buffer: speed. It's much faster to write/read from the buffer and then let the OS determine when the proper time to write to the disk will be. Forcing flushing will slow down your code and can slow down the system if you're writing to the disk a lot. So understand how it works but don't play games with it unless you understand why you need to sync.

When we open a file using Ruby we're strongly encouraged to use the block-form:

File.open('path/to/file', 'w') do |fo|
  ...
end

The block form automatically closes the file, which would flush it, causing the file to be written to disk.

This isn't a situation that is limited to Ruby, it's just how operating systems work, and the language sitting above it inherits this behavior.

Also, we use the block form because it helps preserve the file handles. Operating systems have a limited number of files that can be open at one time. If you use the non-block form and are not diligent to close the file as soon as you're finished with it, and you are in a loop that opens files, you can eventually crash your code, at which point the interpreter will die and all the open files will be closed automatically. But don't rely on the automatic behavior, because it's good programming practice to close explicitly right away, or to use the block form that does it for us.

Upvotes: 1

Related Questions