Reputation: 4772
I'm a C# developer and this is my first attempt at writing F#.
I'm trying to read a Dashlane exported database in the CSV format. These files have no headers and a dynamic number of columns for each possible type of entry. The following file is an example of dummy data that I use to test my software. It only contains password
entries and yet they have between 5 and 7 columns (I'll decide how to handle other types of data later)
The first line of the exported file (in this case, but not always) is the email address that was used to create the dashlane account which makes this line only one column wide.
"[email protected]"
"Nom0","siteweb0","Identifiant0","",""
"Nom1","siteweb1","identifiant1","[email protected]","",""
"Nom2","siteweb2","[email protected]","",""
"Nom3","siteweb3","Identifiant3","password3",""
"Nom4","siteweb4","Identifiant4","[email protected]","password4",""
"Nom5","siteweb5","Identifiant5","[email protected]","SecondIdentifiant5","password5",""
"Nom6","siteweb6","Identifiant6","[email protected]","SecondIdentifiant6","password6","this is a single-line note"
"Nom7","siteweb7","Identifiant7","[email protected]","SecondIdentifiant7","password7","this is a
multi
line note"
"Nom8","siteweb8","Identifiant8","[email protected]","SecondIdentifiant8","password8","single line note"
I'm trying to print the first column of each row to the console as a start
let rawCsv = CsvFile.Load("path\to\file.csv", ",", '"', false)
for row in rawCsv.Rows do
printfn "value %s" row.[0]
This code gives me the the following error on the for
line
Couldn't parse row 2 according to schema: Expected 1 columns, got 5
I haven't give the CsvFile
any schema and I couldn't find on the internet how to specify a schema.
I would be able to remove the first line dynamically if I wanted to but it wouldn't change anything since the other lines have different column counts too.
Is there any way to parse this awakward CSV file in F# ?
Note: For each password
row, only the column right before the last one matters to me (the password column)
Upvotes: 1
Views: 1316
Reputation: 10350
I do not think that CSV file of as irregular structure as yours is a good candidate for processing with CSV Type Provider or CSV Parser.
At the same time it does not seem difficult to parse this file to your likes with few lines of custom logic. The following snippet:
open System
open System.IO
File.ReadAllLines("Sample.csv") // Get data
|> Array.filter(fun x -> x.StartsWith("\"Nom")) // Only lines starting with "Nom may contain password
|> Array.map (fun x -> x.Split(',') |> Array.map (fun x -> x.[1..(x.Length-2)])) // Split each line into "cells"
|> Array.filter(fun x -> x.[x.Length-2] |> String.IsNullOrEmpty |> not) // Take only those having non-empty cell before the last one
|> Array.map (fun x -> x.[0],x.[x.Length-2]) // show the line key and the password
after parsing your sample file produces
>
val it : (string * string) [] =
[|("Nom3", "password3"); ("Nom4", "password4"); ("Nom5", "password5");
("Nom6", "password6"); ("Nom7", "password7"); ("Nom8", "password8")|]
>
It may be a good starting point for further improving the parsing logic to perfection.
Upvotes: 3
Reputation: 937
I propose to read the csv file as a text file. I read the file line by line and form a list and then parse each line with CsvFile.Parse. But the problem is that the elements are found in Headers and not in Rows which is of type string [] option
open FSharp.Data
open System.IO
let readLines (filePath:string) = seq {
use sr = new StreamReader(filePath)
while not sr.EndOfStream do
yield sr.ReadLine ()
}
[<EntryPoint>]
let main argv =
let lines = readLines "c:\path_to_file\example.csv"
let rows = List.map (fun str -> CsvFile.Parse(str)) (Seq.toList lines)
for row in List.toArray(rows) do
printfn "New Line"
if row.Headers.IsSome then
for r in row.Headers.Value do
printfn "value %s" (r)
printfn "%A" argv
0 // return an integer exit code
Upvotes: 2