Reputation: 509
I need to filter data from a long sequence with text lines. The text lines form records like:
{
BEGINTYPE1
VAL1: xxx
VAL2: yyy
ENDTYPE1
// mix of record types including TYPE1
}
I need to keep state during processing:
I was only able to do this with a List, because a sequence seems to read to the end in one expression. It "seems" you can't process part of a sequence and continue in another expression with the sequence "pointer" at the location where it left off? So I used a list. My question, can this processing be done with a sequence using the standard functions like Skip, filter ... etc?
My List solution:
let patLst = [
"VAL1:" ;
"VAL2:" ;
// ..
]
let BeginRecord1 = "BEGINTYPE1"
let EndRecord1 = "ENDTYPE1"
let filter (lines:seq<string>) =
let llines = Seq.toList lines
let matchLine inp =
let rec loop pat =
match pat with
| [] -> None
| h::t ->
let m = Regex.Match(inp, h)
match m.Success with
| true -> Some (h)
| _ -> loop t
loop patLst
let rec findItem i l =
match l with
| [] -> []
| h::t -> if h=i then t
else findItem i t
let findItemsUntil u a l =
let rec loop a l =
match l with
| [] -> ([],a)
| h::t when h=u -> (t , ""::a)
| h::t -> match matchLine h with
| Some(m) -> loop (m::a) t
| None -> loop a t
loop a l
let rec loop a l =
match findItem BeginRecord1 l with
| [] -> List. rev a
| l2 -> let (l3,a) = findItemsUntil EndRecord1 a l2
loop a l3
llines |> loop [""] |> List.fold (fun a x -> a + "\n" + x) ""
}
Upvotes: 2
Views: 650
Reputation: 2532
Aim
Based on the example code this is maybe not exactly what you are looking for but I thought it would be interesting to do a single iteration through the sequence and map the records to concrete types.
Description
This solution uses a state machine that can be in Start
or Collecting
. In Start
it will expect a "BEGINTYPEx". When finding that it will move into a Collecting
state that collects properties into a Map
. When the collecting state hits an "ENDTYPEx" it uses the mapping function to create an instance and adds it to the Aggregate list
, going back to Start
state.
Implementation
Define some types for the records including a Discriminated Union of those and a state type for the fold:
type Type1 = {
val1:string
val2:string
}
type Type2 = {
val1:string
val2:string
}
type Aggregate =
| T1 of Type1
| T2 of Type2
type State =
| Start of Aggregate list
| Collecting of Aggregate list * string * (Map<string,string> -> Aggregate) * Map<string,string>
Define some mapping functions to map a Map
to the record type:
let mapType1 (dic:Map<string,string>) =
Aggregate.T1
{
val1 = dic.["VAL1"]
val2 = dic.["VAL2"]
}
let mapType2 (dic:Map<string,string>) =
Aggregate.T2
{
val1 = dic.["VAL1"]
val2 = dic.["VAL2"]
}
Next we have some Active Patterns to easily decide on a match:
let (|Begin|_|) input =
match input with
| "BEGINTYPE1" -> Some ("TYPE1", mapType1)
| "BEGINTYPE2" -> Some ("TYPE2", mapType2)
| _ -> None
let (|Prop|_|) input =
if(String.IsNullOrEmpty(input)) then None
else
if(input.Contains(":")) then
let split = input.Split(":")
let pName = split.[0].Trim()
let pValue = split.[1].Trim()
Some (pName,pValue)
else None
let (|End|_|) (l,label,f,m) input =
match input with
| "ENDTYPE1" -> Some (List.append l ([f m]), label)
| "ENDTYPE2" -> Some (List.append l ([f m]), label)
| _ -> None
The actual folder function that moves from one state to the next:
let folder state line =
match state with
| Start xs ->
match line with
| Begin (label, f) -> Collecting (xs, label, f, Map.empty<string,string>)
| _ -> failwithf "Should start with a BEGINTYPEx, intead was %s" line
| Collecting (xs, label, f, m) ->
match line with
| Prop (k,v) -> Collecting (xs, label, f, Map.add k v m)
| End(xs, label, f, m) (ys, s) -> Start ys
| _ -> failwithf "Expecting property or ENDTYPEx, instead was %s" line
A simple helper method to help easily extract the list:
let extractTypeList state =
match state with
| Start xs -> xs
| Collecting (xs, _,_,_) -> xs
Finally, the usage:
let lines = seq {
yield "BEGINTYPE1"
yield "VAL1: xxx"
yield "VAL2: yyy"
yield "ENDTYPE1"
yield "BEGINTYPE2"
yield "VAL1: xxx"
yield "VAL2: yyy"
yield "ENDTYPE2"
}
let extractTypes lines =
lines
|> Seq.fold folder (Start [])
|> extractTypeList
|> List.iter (fun a -> printfn "%A" a)
extractTypes lines |> ignore
Some helpful links:
Learning about Active Patterns.
Learning about fold.
Upvotes: 1
Reputation: 6510
You can work with sequences in nearly the same way as lists, you just have to use the Seq.head
and Seq.tail
functions instead of the convenient pattern-matching syntax that's available for lists. Using the built-in functions, your solution would look like this:
open System.Text.RegularExpressions
let patLst = [
"VAL1:" ;
"VAL2:" ;
// ..
]
let BeginRecord1 = "BEGINTYPE1"
let EndRecord1 = "ENDTYPE1"
let filter (lines:seq<string>) =
let matchLine inp =
let rec loop pat =
match pat with
| [] -> None
| h::t ->
match Regex.Match(inp, h) with
| m when m.Success -> Some (h)
| _ -> loop t
loop patLst
let rec findItem i l =
if l |> Seq.isEmpty
then Seq.empty
else let h = l |> Seq.head
let t = l |> Seq.tail
if h=i
then t
else findItem i t
let findItemsUntil u a l =
let rec loop a l =
if l |> Seq.isEmpty
then (Seq.empty,a)
else let h = l |> Seq.head
let t = l |> Seq.tail
if h=u
then (t , ""::a)
else match matchLine h with
| Some(m) -> loop (m::a) t
| None -> loop a t
loop a l
let rec loop a l =
match findItem BeginRecord1 l with
| s when s |> Seq.isEmpty -> List.rev a
| l2 -> let (l3,a) = findItemsUntil EndRecord1 a l2
loop a l3
lines |> loop [""] |> List.fold (fun a x -> a + "\n" + x) ""
Now, if you wanted to simplify the logic, you could write your own Active Pattern to do the same thing as the head :: tail
pattern for lists. The active-pattern itself would look something like this:
let (|HT|Empty|) s =
match s |> Seq.tryHead with
| Some head -> HT (head, s |> Seq.tail)
| None -> Empty
Then your implementation could stay almost identical to your list-based version, just swapping in this active-pattern and replacing the empty lists with Seq.empty
:
let filter (lines:seq<string>) =
let matchLine inp =
let rec loop pat =
match pat with
| Empty -> None
| HT (h,t) ->
let m = Regex.Match(inp, h)
match m.Success with
| true -> Some (h)
| _ -> loop t
loop patLst
let rec findItem i l =
match l with
| Empty -> Seq.empty
| HT (h,t) -> if h=i then t
else findItem i t
let findItemsUntil u a l =
let rec loop a l =
match l with
| Empty -> (Seq.empty,a)
| HT (h,t) when h=u -> (t , ""::a)
| HT (h,t) -> match matchLine h with
| Some(m) -> loop (m::a) t
| None -> loop a t
loop a l
let rec loop a l =
match findItem BeginRecord1 l with
| Empty -> List.rev a
| l2 -> let (l3,a) = findItemsUntil EndRecord1 a l2
loop a l3
lines |> loop [""] |> List.fold (fun a x -> a + "\n" + x) ""
Upvotes: 0