Reputation: 307
I'm looking to try out F# to read a comma-delimited file into memory, de-duplicate it by one field, and write the results out to a pipe-separated file.
I've written an example of exactly what I want the program to do in C#:
var input = new StreamReader(@"D:\input.txt");
var addresses = new Dictionary<string, AddressModel>();
while (!input.EndOfStream)
{
var address = new AddressModel(input);
if (!addresses.ContainsKey(address.Id))
addresses.Add(address.Id, address);
}
var output = new StreamWriter(@"D:\CSharp.txt");
foreach (var address in addresses.Values)
{
output.WriteLine(address.ToString());
}
output.Flush();
With the AddressModel defined as:
class AddressModel
{
public string Id { get; set; }
public string StreetName { get; set; }
public int ZipCode { get; set; }
public AddressModel(StreamReader inputStream)
{
if (inputStream == null) return;
var input = inputStream.ReadLine();
if (input == null) return;
var split = input.Split(new char[] { ',' }, StringSplitOptions.None);
Id = split[0];
ZipCode = int.Parse(split[1]);
StreetName = BuildStreet(split);
}
private string BuildStreet(string[] items)
{
var street = "";
if (!string.IsNullOrWhiteSpace(items[5]))
street += items[5];
if (!string.IsNullOrWhiteSpace(items[6]))
street += string.IsNullOrWhiteSpace(street) ? items[6] : " " + items[6];
if (!string.IsNullOrWhiteSpace(items[7]))
street += string.IsNullOrWhiteSpace(street) ? items[7] : " " + items[7];
if (!string.IsNullOrWhiteSpace(items[8]))
street += string.IsNullOrWhiteSpace(street) ? items[8] : " " + items[8];
return street;
}
public override string ToString()
{
return string.Format("{0}|{1}|{2}", Id, StreetName, ZipCode);
}
}
So what I'd like the program to do is read the file, line by line, use each line to construct a new AddressModel object, see if this item already exists in a dictionary, adding it if it doesn't, then write the contents of this dictionary to a second text file.
Of course, if I'm thinking "too object-oriented", and I can be doing this in a more functional manner, I'd be grateful if someone could point me in the right direction.
Upvotes: 3
Views: 3944
Reputation: 26184
You can write the main program like this:
open System
let lines = IO.File.ReadLines @"D:\input.txt"
let addresses = new Dictionary<string, AddressModel>()
lines |> Seq.iter (fun line ->
let address = AddressModel line
if not (addresses.ContainsKey address.Id) then
addresses.Add (address.Id, address))
IO.File.WriteAllLines(@"D:\CSharp.txt", Seq.map string addresses.Values)
As you can see the structure is not very different as what you had in C#, the difference is that instead of loops you can use higher order functions like map
and iter
Then regarding your Address class you can re-use your C# class or write an F# function that parse each line:
let parseLine (input:string) =
let split = input.Split [|','|]
let id, zipCode = split.[0], Int32.Parse split.[1]
let street =
split.[5..8]
|> Array.filter (String.IsNullOrWhiteSpace >> not)
|> String.concat " "
(id, zipCode, street)
let printLine (id, zipCode, street) = sprintf "%s|%i|%s" id zipCode street
then you can update your main function like this:
open System
let lines = IO.File.ReadLines @"D:\input.txt"
let addresses = new Dictionary<string, (string*int*string)>()
lines |> Seq.map parseLine |> Seq.iter (fun ((id,_,_) as line) ->
if not (addresses.ContainsKey id) then
addresses.Add (id, line))
IO.File.WriteAllLines(@"D:\CSharp.txt", Seq.map printLine addresses.Values)
Now you don't need the Dictionary step at all if its sole purpose is to get distinct ids. You can use Seq.distinctBy
as suggested in the other answer. So your code will be further reduced to:
let lines =
IO.File.ReadLines @"D:\input.txt"
|> Seq.map parseLine
|> Seq.distinctBy (fun (id,_,_) -> id)
IO.File.WriteAllLines(@"D:\CSharp.txt", Seq.map printLine lines)
UPDATE
Here's the final code suggested:
open System
let parseLine (input:string) =
let split = input.Split [|','|]
let id, zipCode = split.[0], Int32.Parse split.[1]
let street =
split.[5..8]
|> Array.filter (String.IsNullOrWhiteSpace >> not)
|> String.concat " "
(id, zipCode, street)
let printLine (id, zipCode, street) = sprintf "%s|%i|%s" id zipCode street
let lines =
IO.File.ReadLines @"D:\input.txt"
|> Seq.map parseLine
|> Seq.distinctBy (fun (id,_,_) -> id)
IO.File.WriteAllLines(@"D:\CSharp.txt", Seq.map printLine lines)
Upvotes: 3
Reputation: 2551
You can use the Seq.distinctBy
that works internally using a Dictionary
.
type Contact = {Id:string; Name:string}
let lines = File.ReadLines(@"D:\input.txt")
let output =
lines
|> Seq.map toContact
|> Seq.distinctBy (fun c -> c.Id)
|> Seq.map contactToStr
File.WriteAllLines(@"D:\CSharp.txt", output)
Saying you have a contact type, a function to build a contact from a string (toContact
) and a function to build a string from a contact type (contactToStr
), for instance:
let toContact (str:string) =
let values = str.Split(',')
{Id = values.[0]; Name = values.[1]}
let contactToStr contact = sprintf "%s|%s" contact.Id contact.Name
Upvotes: 1