How to get a specific number of lines from a file using Elixir streams

How to get a specific number of lines from a file using Elixir streams

ยท

3 min read

We all have to deal with files sometimes. Manipulate and gather data from them. From messing with an entire file or processing just part of it, Elixir has us covered. You just need to make some tweaking and mess around with different modules to be able to do the latter.

Let's see how!

The File module ๐Ÿ“‚

Elixir has a built in module File that helps us manipulate, well... files.

It makes reading, writing, and other types of operations in files pretty simple. Want to read the contents of a file? Just use File.read(<file_path>) and voila! Processed a few data and want to store it in a file? You can use File.write(<file_path>, content) and you're good to go!

These standard functions return tuples containing an status atom and the contents.

File.read("file.txt")
#=> {:ok, "File contents"}

File.read("invalid_file.txt")
#=> {:error, :enoent}

But if you are sure about your file or errors don't concern you much, you can use the ! variant of those functions to go straight to the point.

File.read!("file.txt")
#=> "File contents"

File.read!("invalid_file.txt")
#=> raises File.Error

Now to the problem at hand. As we can see those functions work to read the entirety of the files. What if you are working with enormous files? Or you just want to work with a specific portion of the file? That's where Streams come in!

Using Streams ๐ŸŠ

As said in the stream page of the Elixir docs:

Streams are composable, lazy enumerables. Any enumerable that generates elements one by one during enumeration is called a stream.

So, how can we use it to work with just a part of a file? Well, we first must transform that file into a stream using the File.stream! function and then grab the part of the file that we want with the Stream.take function.

The Stream.take function allows you to... take n lines of a file and do what you have to do with them. If you want to work with, let's say the header of a file that is present in its first 5 lines, you can use this method to avoid loading the entire file just to work with those 5 lines.

"/home/file_with_header.txt"
    |> File.stream!
    |> Stream.take(5)

This code will return a stream containing the 5 first lines of the file, and from that you can do whatever you need to do with it.

Pretty simple right? This is the solution I found to this problem, if you know any other ones that are better or more efficient please let us know!

Now it's up to you ๐Ÿ’ช

Besides taking the n first lines of a file you can use chunks of the file, drop some specific lines at once or continuously, etc.

Elixir provides a lot of built in functions and modules that will help us manipulate the files that we need in various ways. If you have encountered other problems you should check the stream module page to see what can help you.

The possibilities are huge, and you should take advantage of it. The documentation is awesome and filled with examples. You just have to put the pieces together and make it work the way you want it to!

ย