Reputation: 195
I am teaching myself Elixir for my research, and oftentimes my research requires opening several dozen or hundred text files, combining the data in these files, and manipulating the data. I am trying to figure out how I can open all the files I have in a directory and access the data in all these files. I would like to avoid using a for loop because to iterate through 100 files in a loop would be very slow. I think that the Stream module is ideal for my purposes, but I don't know quite how to use it.
Below, I have some test code. All it is supposed to do is open a bunch of files containing random numbers, convert the strings of numbers in the files to integers, and sort them. Everything works except the opening files part. You can see I tried to use the Path module, and this does succeed in finding all the files, but I don't know how to then pass that to the sort_num function in a usable way. Thanks for your help everyone!
defmodule OpenFiles do
def file_open do
Path.wildcard("numfiles/*.txt")
end
def sort_num do
file_open
|> File.stream!
|> Stream.map(&String.strip/1)
|> Stream.map(&String.to_integer/1)
|> Enum.sort
end
end
IO.inspect OpenFiles.sort_num
Upvotes: 1
Views: 711
Reputation: 10041
The File.stream!/3
function only works on one file at a time. If you are using the wildcard and collecting multiple files at once, it does not work the way you expect.
If you look at the return of Path.wildcard/2
, you get a list of all files matched. something along the lines of
["foo.txt", "bar.txt", "baz.txt"]
If you pass this into File.stream!/3
, it tries to append all of these values together.
File.stream! ["foo.txt", "bar.txt", "baz.txt"]
%File.Stream{line_or_bytes: :line, modes: [:raw, :read_ahead, :binary],
path: "foo.txtbar.txtbaz.txt", raw: true}
As you can see, it thinks the path you are trying to access is "foo.txtbar.txtbaz.txt"
, which is incorrect and all of the "paths" concatenated together.
In order to access all of these files, you are going to have to run each one on its own.
defmodule OpenFiles do
def file_open do
Path.wildcard("numfiles/*.txt")
end
def sort_num do
file_open()
|> Enum.map(fn file ->
file
|> File.stream!()
|> Stream.map(&String.strip/1)
|> Stream.map(&String.to_integer/1)
|> Enum.take(1) # This only takes the first line. This may or may not be what you want.
end)
|> List.flatten()
|> Enum.sort()
end
end
As you mentioned, if you have a lot of files (or large files), this could take a long time. However, you can mitigate this by using a parallel map implementation instead of the sequential Enum.map/2
.
Upvotes: 3