Fusi123
Fusi123

Reputation: 87

Joining two lists of records and calculating a result

I have two lists of records with the following types:

type AverageTempType = {Date: System.DateTime; Year: int64; Month: int64; AverageTemp: float}
type DailyTempType = {Date: System.DateTime; Year: int64; Month: int64; Day: int64; DailyTemp: float}

I want to get a new list which is made up of the DailyTempType "joined" with the AverageTempType. Ultimately though for each daily record I want the Daily Temp - Average temp for the matching month.

I think I can do this with loops as per below and massage this into a reasonable output:

let MatchLoop = 
    for i in DailyData do
        for j in AverageData do
                if (i.Year = j.Year && i.Month = j.Month) 
                then printfn "%A %A %A %A %A" i.Year i.Month i.Day i.DailyTemp j.Average
                else printfn "NOMATCH" 

I have also try to do this with matching but I can't quite get there (I'm not sure how to define the list correctly in the input type and then iterate to get a result. Also I'm not sure sure if this approach even makes sense):

let MatchPattern (x:DailyTempType) (y:AverageTempType) = 
match (x,y) with
|(x,y) when (x.Year = y.Year && x.Month = y.Month) -> 
    printfn "match"
|(_,_) -> printfn "nomatch"

I have looked into Deedle which I think can do this relatively easily but I am keen to understand how to do it a lower level.

Upvotes: 1

Views: 178

Answers (2)

Mark Seemann
Mark Seemann

Reputation: 233150

What you can do is to create a map of the monthly average data. You can think of a map as a read-only dictionary:

let averageDataMap =
    averageData
    |> Seq.map (fun x -> ((x.Year, x.Month), x))
    |> Map.ofSeq

This particular map is a Map<(int64 * int64), AverageTempType>, which, in plainer words, means that the keys into the map are tuples of year and month, and the value associated with each key is an AverageTempType record.

This enables you to find all the matching month data, based on the daily data:

let matches = 
    dailyData
    |> Seq.map (fun x -> (x, averageDataMap |> Map.tryFind (x.Year, x.Month)))

Here, matches has the data type seq<DailyTempType * AverageTempType option>. Again, in plainer words, this is a sequence of tuples, where the first element of each tuple is the original daily observation, and the second element is the corresponding monthly average, if a match was found, or None if no matching monthly average was found.

If you want to print the values as in the OP, you can do this:

matches
|> Seq.map snd
|> Seq.map (function | Some _ -> "Match" | None -> "No match")
|> Seq.iter (printfn "%s")

This expression starts with the matches; then pulls out the second element of each tuple; then again maps a Some value to the string "Match", and a None value to the string "No match"; and finally prints each string.

Upvotes: 2

jruizaranguren
jruizaranguren

Reputation: 13605

I would convert first AverageTempType seq to a Map (reducing cost of join):

let toMap (avg:AverageTempType seq) = avg |> Seq.groupBy(fun a -> a.Year + a.Month) |> Map.ofSeq

Then you can join and return an option, so consuming code can do whatever you want (print, store, error, etc.):

let join (avg:AverageTempType seq) (dly:DailyTempType seq) = 
    let avgMap = toMap avg
    dly |> Seq.map (fun d -> d.Year, d.Month, d.Day, d.DailyTemp, Map.tryFind (d.Year + d.Month) avgMap);;

Upvotes: 0

Related Questions