Reputation: 9110
I am basically trying to read a large file (around 10G) into a list of lines. The file contains a sequence of integer, something like this:
0x123456
0x123123
0x123123
.....
I used the method below to read files by default for my codebase, but it turns out to be quit slow (~12 minutes) at this scenario
let lines_from_file (filename : string) : string list =
let lines = ref [] in
let chan = open_in filename in
try
while true; do
lines := input_line chan :: !lines
done; []
with End_of_file ->
close_in chan;
List.rev !lines;;
I guess I need to read the file into memory, and then split them into lines (I am using a 128G server, so it should be fine for the memory space). But I still didn't understand whether OCaml
provides such facility after searching the documents here.
So here is my question:
Given my situation, how to read files into string list in a fast way?
How about using stream
? But I need to adjust related application code, then that could cause some time.
Upvotes: 7
Views: 2327
Reputation: 2927
I often use the two following function to read the lines of a file. Note that the function lines_from_files
is tail-recursive.
let read_line i = try Some (input_line i) with End_of_file -> None
let lines_from_files filename =
let rec lines_from_files_aux i acc = match (read_line i) with
| None -> List.rev acc
| Some s -> lines_from_files_aux i (s :: acc) in
lines_from_files_aux (open_in filename) []
let () =
lines_from_files "foo"
|> List.iter (Printf.printf "lines = %s\n")
Upvotes: 4
Reputation: 977
This should work:
let rec ints_from_file fdesc =
try
let l = input_line fdesc in
let l' = int_of_string l in
l' :: ints_from_file fdesc
with | _ -> []
This solution converts the strings to integers as they're read in (which should be a bit more memory efficient, and I assume this was going to be done to them eventually.
Also, because it is recursive, the file must be opened outside of the function call.
Upvotes: 0
Reputation: 35210
First of all you should consider whether you really need to have all the information at once in your memory. Maybe it is better to process file line-by-line?
If you really want to have it all at once in memory, then you can use Bigarray
's map_file
function to map a file as an array of characters. And then do something with it.
Also, as I see, this file contains numbers. Maybe it is better to allocate the array (or even better a bigarray) and the process each line in order and store integers in the (big)array.
Upvotes: 8