Rob Frohwein
Rob Frohwein

Reputation: 509

FSharp sequence processing with state

I need to filter data from a long sequence with text lines. The text lines form records like:

{  
    BEGINTYPE1    
    VAL1: xxx
    VAL2: yyy
    ENDTYPE1

    // mix of record types including TYPE1
}

I need to keep state during processing:

  1. find the record type, thus skipping other text
  2. filter the relevant values until the record end is found
  3. Continue with 1

I was only able to do this with a List, because a sequence seems to read to the end in one expression. It "seems" you can't process part of a sequence and continue in another expression with the sequence "pointer" at the location where it left off? So I used a list. My question, can this processing be done with a sequence using the standard functions like Skip, filter ... etc?

My List solution:

let patLst =  [    
    "VAL1:"         ; 
    "VAL2:"         ; 
    // ..
    ]

let BeginRecord1 = "BEGINTYPE1"
let EndRecord1   = "ENDTYPE1"

let filter (lines:seq<string>) = 
  let llines = Seq.toList lines

  let matchLine inp =  
     let rec loop pat = 
        match pat with 
        | [] -> None
        | h::t -> 
            let m = Regex.Match(inp, h)
            match m.Success with
            | true -> Some (h)
            | _ -> loop t

     loop patLst

  let rec findItem i l = 
    match l with 
    | []    -> []
    | h::t  -> if h=i then  t
               else findItem i t 

  let findItemsUntil u a l =
    let rec loop a l = 
        match l with 
        | []    ->  ([],a)
        | h::t when h=u -> (t , ""::a)
        | h::t -> match matchLine h with
                    | Some(m)  -> loop (m::a) t
                    | None -> loop a t
    loop a l

  let rec loop a l = 
    match findItem  BeginRecord1 l with
    | [] -> List. rev a
    | l2 -> let (l3,a) = findItemsUntil EndRecord1 a l2 
            loop a l3

  llines |> loop  [""] |> List.fold (fun a x -> a + "\n" + x) ""  

}

Upvotes: 2

Views: 650

Answers (2)

Devon Burriss
Devon Burriss

Reputation: 2532

Aim
Based on the example code this is maybe not exactly what you are looking for but I thought it would be interesting to do a single iteration through the sequence and map the records to concrete types.

Description
This solution uses a state machine that can be in Start or Collecting. In Start it will expect a "BEGINTYPEx". When finding that it will move into a Collecting state that collects properties into a Map. When the collecting state hits an "ENDTYPEx" it uses the mapping function to create an instance and adds it to the Aggregate list, going back to Start state.

Implementation
Define some types for the records including a Discriminated Union of those and a state type for the fold:

type Type1 = {
    val1:string
    val2:string
}

type Type2 = {
    val1:string
    val2:string
}

type Aggregate =
| T1 of Type1
| T2 of Type2

type State =
| Start of Aggregate list
| Collecting of Aggregate list * string * (Map<string,string> -> Aggregate) * Map<string,string>

Define some mapping functions to map a Map to the record type:

let mapType1 (dic:Map<string,string>) = 
    Aggregate.T1 
        {
            val1 = dic.["VAL1"]
            val2 = dic.["VAL2"]
        }

let mapType2 (dic:Map<string,string>) = 
    Aggregate.T2
        {
            val1 = dic.["VAL1"]
            val2 = dic.["VAL2"]
        }

Next we have some Active Patterns to easily decide on a match:

let (|Begin|_|) input =        
    match input with
        | "BEGINTYPE1" -> Some ("TYPE1", mapType1)
        | "BEGINTYPE2" -> Some ("TYPE2", mapType2)
        | _ -> None

let (|Prop|_|) input =        
    if(String.IsNullOrEmpty(input)) then None
    else 
        if(input.Contains(":")) then
            let split = input.Split(":")
            let pName = split.[0].Trim()
            let pValue = split.[1].Trim()
            Some (pName,pValue)
        else None

let (|End|_|) (l,label,f,m) input =        
    match input with
        | "ENDTYPE1" -> Some (List.append l ([f m]), label)
        | "ENDTYPE2" -> Some (List.append l ([f m]), label)
        | _ -> None

The actual folder function that moves from one state to the next:

let folder state line =
    match state with
    | Start xs -> 
        match line with
        | Begin (label, f) -> Collecting (xs, label, f, Map.empty<string,string>)
        | _ -> failwithf "Should start with a BEGINTYPEx, intead was %s" line
    | Collecting (xs, label, f, m) -> 
        match line with
        | Prop (k,v) -> Collecting (xs, label, f, Map.add k v m)
        | End(xs, label, f, m) (ys, s) -> Start ys
        | _ -> failwithf "Expecting property or ENDTYPEx, instead was %s" line

A simple helper method to help easily extract the list:

let extractTypeList state =
    match state with
    | Start xs -> xs
    | Collecting (xs, _,_,_) -> xs

Finally, the usage:

let lines = seq {
        yield "BEGINTYPE1"
        yield "VAL1: xxx"
        yield "VAL2: yyy"
        yield "ENDTYPE1"
        yield "BEGINTYPE2"
        yield "VAL1: xxx"
        yield "VAL2: yyy"
        yield "ENDTYPE2"
    }

let extractTypes lines = 
    lines 
    |> Seq.fold folder (Start [])
    |> extractTypeList
    |> List.iter (fun a -> printfn "%A" a)

extractTypes lines |> ignore

Some helpful links:

Learning about Active Patterns.
Learning about fold.

Upvotes: 1

Aaron M. Eshbach
Aaron M. Eshbach

Reputation: 6510

You can work with sequences in nearly the same way as lists, you just have to use the Seq.head and Seq.tail functions instead of the convenient pattern-matching syntax that's available for lists. Using the built-in functions, your solution would look like this:

open System.Text.RegularExpressions

let patLst =  [    
    "VAL1:"         ; 
    "VAL2:"         ; 
    // ..
    ]

let BeginRecord1 = "BEGINTYPE1"
let EndRecord1   = "ENDTYPE1"

let filter (lines:seq<string>) = 
  let matchLine inp =  
     let rec loop pat = 
        match pat with 
        | [] -> None
        | h::t -> 
            match Regex.Match(inp, h) with
            | m when m.Success -> Some (h)
            | _ -> loop t

     loop patLst

  let rec findItem i l = 
    if l |> Seq.isEmpty
    then Seq.empty
    else let h = l |> Seq.head  
         let t = l |> Seq.tail
         if h=i 
         then t
         else findItem i t 

  let findItemsUntil u a l =
    let rec loop a l = 
        if l |> Seq.isEmpty
        then (Seq.empty,a)
        else let h = l |> Seq.head
             let t = l |> Seq.tail
             if h=u 
             then (t , ""::a)
             else match matchLine h with
                  | Some(m)  -> loop (m::a) t
                  | None -> loop a t
    loop a l

  let rec loop a l = 
    match findItem  BeginRecord1 l with
    | s when s |> Seq.isEmpty -> List.rev a
    | l2 -> let (l3,a) = findItemsUntil EndRecord1 a l2 
            loop a l3

  lines |> loop  [""] |> List.fold (fun a x -> a + "\n" + x) ""  

Now, if you wanted to simplify the logic, you could write your own Active Pattern to do the same thing as the head :: tail pattern for lists. The active-pattern itself would look something like this:

let (|HT|Empty|) s =
    match s |> Seq.tryHead with
    | Some head -> HT (head, s |> Seq.tail)
    | None -> Empty

Then your implementation could stay almost identical to your list-based version, just swapping in this active-pattern and replacing the empty lists with Seq.empty:

let filter (lines:seq<string>) = 
  let matchLine inp =  
     let rec loop pat = 
        match pat with 
        | Empty -> None
        | HT (h,t) -> 
            let m = Regex.Match(inp, h)
            match m.Success with
            | true -> Some (h)
            | _ -> loop t

     loop patLst

  let rec findItem i l = 
    match l with 
    | Empty -> Seq.empty
    | HT (h,t) -> if h=i then  t
                  else findItem i t 

  let findItemsUntil u a l =
    let rec loop a l = 
        match l with 
        | Empty -> (Seq.empty,a)
        | HT (h,t) when h=u -> (t , ""::a)
        | HT (h,t) -> match matchLine h with
                      | Some(m) -> loop (m::a) t
                      | None -> loop a t
    loop a l

  let rec loop a l = 
    match findItem  BeginRecord1 l with
    | Empty -> List.rev a
    | l2 -> let (l3,a) = findItemsUntil EndRecord1 a l2 
            loop a l3

  lines |> loop  [""] |> List.fold (fun a x -> a + "\n" + x) ""  

Upvotes: 0

Related Questions