JamieS
JamieS

Reputation: 307

How to use F# to read a file into an in-memory collection?

I'm looking to try out F# to read a comma-delimited file into memory, de-duplicate it by one field, and write the results out to a pipe-separated file.

I've written an example of exactly what I want the program to do in C#:

        var input = new StreamReader(@"D:\input.txt");
        var addresses = new Dictionary<string, AddressModel>();

        while (!input.EndOfStream)
        {
            var address = new AddressModel(input);
            if (!addresses.ContainsKey(address.Id))
                addresses.Add(address.Id, address);
        }

        var output = new StreamWriter(@"D:\CSharp.txt");
        foreach (var address in addresses.Values)
        {
            output.WriteLine(address.ToString());
        }

        output.Flush();

With the AddressModel defined as:

    class AddressModel
    {
    public string Id { get; set; }
    public string StreetName { get; set; }
    public int ZipCode { get; set; }

    public AddressModel(StreamReader inputStream)
    {
        if (inputStream == null) return;

        var input = inputStream.ReadLine();
        if (input == null) return;
        var split = input.Split(new char[] { ',' }, StringSplitOptions.None);

        Id = split[0];
        ZipCode = int.Parse(split[1]);
        StreetName = BuildStreet(split);
    }

    private string BuildStreet(string[] items)
    {
        var street = "";
        if (!string.IsNullOrWhiteSpace(items[5]))
            street += items[5];
        if (!string.IsNullOrWhiteSpace(items[6]))
            street += string.IsNullOrWhiteSpace(street) ? items[6] : " " + items[6];
        if (!string.IsNullOrWhiteSpace(items[7]))
            street += string.IsNullOrWhiteSpace(street) ? items[7] : " " + items[7];
        if (!string.IsNullOrWhiteSpace(items[8]))
            street += string.IsNullOrWhiteSpace(street) ? items[8] : " " + items[8];
        return street;
    }

    public override string ToString()
    {
        return string.Format("{0}|{1}|{2}", Id, StreetName, ZipCode);
    }
}

So what I'd like the program to do is read the file, line by line, use each line to construct a new AddressModel object, see if this item already exists in a dictionary, adding it if it doesn't, then write the contents of this dictionary to a second text file.

Of course, if I'm thinking "too object-oriented", and I can be doing this in a more functional manner, I'd be grateful if someone could point me in the right direction.

Upvotes: 3

Views: 3944

Answers (2)

Gus
Gus

Reputation: 26184

You can write the main program like this:

open System 
let lines = IO.File.ReadLines @"D:\input.txt"
let addresses = new Dictionary<string, AddressModel>()
lines |> Seq.iter (fun line -> 
    let address = AddressModel line
    if not (addresses.ContainsKey address.Id) then
        addresses.Add (address.Id, address))
IO.File.WriteAllLines(@"D:\CSharp.txt", Seq.map string addresses.Values)

As you can see the structure is not very different as what you had in C#, the difference is that instead of loops you can use higher order functions like map and iter

Then regarding your Address class you can re-use your C# class or write an F# function that parse each line:

let parseLine (input:string) =
    let split = input.Split [|','|]
    let id, zipCode = split.[0], Int32.Parse split.[1]
    let street = 
        split.[5..8] 
        |> Array.filter (String.IsNullOrWhiteSpace >> not)
        |> String.concat " "
    (id, zipCode, street)

let printLine (id, zipCode, street) = sprintf "%s|%i|%s" id zipCode street

then you can update your main function like this:

open System 
let lines = IO.File.ReadLines @"D:\input.txt"
let addresses = new Dictionary<string, (string*int*string)>()
lines |> Seq.map parseLine |> Seq.iter (fun ((id,_,_) as line) -> 
    if not (addresses.ContainsKey id) then
        addresses.Add (id, line))

IO.File.WriteAllLines(@"D:\CSharp.txt", Seq.map printLine addresses.Values)

Now you don't need the Dictionary step at all if its sole purpose is to get distinct ids. You can use Seq.distinctBy as suggested in the other answer. So your code will be further reduced to:

let lines = 
    IO.File.ReadLines @"D:\input.txt"
    |> Seq.map parseLine 
    |> Seq.distinctBy (fun (id,_,_) -> id)

IO.File.WriteAllLines(@"D:\CSharp.txt", Seq.map printLine lines)

UPDATE

Here's the final code suggested:

open System 

let parseLine (input:string) =
    let split = input.Split [|','|]
    let id, zipCode = split.[0], Int32.Parse split.[1]
    let street = 
        split.[5..8] 
        |> Array.filter (String.IsNullOrWhiteSpace >> not)
        |> String.concat " "
    (id, zipCode, street)

let printLine (id, zipCode, street) = sprintf "%s|%i|%s" id zipCode street

let lines = 
    IO.File.ReadLines @"D:\input.txt"
    |> Seq.map parseLine 
    |> Seq.distinctBy (fun (id,_,_) -> id)

IO.File.WriteAllLines(@"D:\CSharp.txt", Seq.map printLine lines)

Upvotes: 3

polkduran
polkduran

Reputation: 2551

You can use the Seq.distinctBy that works internally using a Dictionary.

type Contact = {Id:string; Name:string}
let lines = File.ReadLines(@"D:\input.txt")

let output = 
        lines 
        |> Seq.map toContact
        |> Seq.distinctBy (fun c -> c.Id)
        |> Seq.map contactToStr

File.WriteAllLines(@"D:\CSharp.txt", output)

Saying you have a contact type, a function to build a contact from a string (toContact) and a function to build a string from a contact type (contactToStr), for instance:

let toContact (str:string) = 
        let values = str.Split(',')
        {Id = values.[0]; Name = values.[1]}
let contactToStr contact = sprintf "%s|%s" contact.Id contact.Name

Upvotes: 1

Related Questions