Mathieu VIALES
Mathieu VIALES

Reputation: 4772

Parsing awkward CSV file with a dynamic number of columns gives error

I'm a C# developer and this is my first attempt at writing F#.

I'm trying to read a Dashlane exported database in the CSV format. These files have no headers and a dynamic number of columns for each possible type of entry. The following file is an example of dummy data that I use to test my software. It only contains password entries and yet they have between 5 and 7 columns (I'll decide how to handle other types of data later) The first line of the exported file (in this case, but not always) is the email address that was used to create the dashlane account which makes this line only one column wide.

"[email protected]"
"Nom0","siteweb0","Identifiant0","",""
"Nom1","siteweb1","identifiant1","[email protected]","",""
"Nom2","siteweb2","[email protected]","",""
"Nom3","siteweb3","Identifiant3","password3",""
"Nom4","siteweb4","Identifiant4","[email protected]","password4",""
"Nom5","siteweb5","Identifiant5","[email protected]","SecondIdentifiant5","password5",""
"Nom6","siteweb6","Identifiant6","[email protected]","SecondIdentifiant6","password6","this is a single-line note"
"Nom7","siteweb7","Identifiant7","[email protected]","SecondIdentifiant7","password7","this is a 
multi
line note"
"Nom8","siteweb8","Identifiant8","[email protected]","SecondIdentifiant8","password8","single line note"

I'm trying to print the first column of each row to the console as a start

let rawCsv = CsvFile.Load("path\to\file.csv", ",", '"', false)       
for row in rawCsv.Rows do
    printfn "value %s" row.[0]

This code gives me the the following error on the for line

Couldn't parse row 2 according to schema: Expected 1 columns, got 5

I haven't give the CsvFile any schema and I couldn't find on the internet how to specify a schema.

I would be able to remove the first line dynamically if I wanted to but it wouldn't change anything since the other lines have different column counts too.

Is there any way to parse this awakward CSV file in F# ?

Note: For each password row, only the column right before the last one matters to me (the password column)

Upvotes: 1

Views: 1316

Answers (2)

Gene Belitski
Gene Belitski

Reputation: 10350

I do not think that CSV file of as irregular structure as yours is a good candidate for processing with CSV Type Provider or CSV Parser.

At the same time it does not seem difficult to parse this file to your likes with few lines of custom logic. The following snippet:

open System
open System.IO

File.ReadAllLines("Sample.csv") // Get data
|> Array.filter(fun x -> x.StartsWith("\"Nom")) // Only lines starting with "Nom may contain password
|> Array.map (fun x -> x.Split(',') |> Array.map (fun x -> x.[1..(x.Length-2)])) // Split each line into "cells"
|> Array.filter(fun x -> x.[x.Length-2] |> String.IsNullOrEmpty |> not) // Take only those having non-empty cell before the last one
|> Array.map (fun x -> x.[0],x.[x.Length-2]) // show the line key and the password

after parsing your sample file produces

>
val it : (string * string) [] =
[|("Nom3", "password3"); ("Nom4", "password4"); ("Nom5", "password5");
("Nom6", "password6"); ("Nom7", "password7"); ("Nom8", "password8")|]
>

It may be a good starting point for further improving the parsing logic to perfection.

Upvotes: 3

Jean-Claude Colette
Jean-Claude Colette

Reputation: 937

I propose to read the csv file as a text file. I read the file line by line and form a list and then parse each line with CsvFile.Parse. But the problem is that the elements are found in Headers and not in Rows which is of type string [] option

 open  FSharp.Data
 open System.IO

 let readLines (filePath:string) = seq {
     use sr = new StreamReader(filePath)
     while not sr.EndOfStream do
         yield sr.ReadLine ()
 }

 [<EntryPoint>]
 let main argv = 
     let lines = readLines "c:\path_to_file\example.csv"
     let rows = List.map (fun str -> CsvFile.Parse(str)) (Seq.toList lines)
     for row in List.toArray(rows) do
         printfn "New Line"
         if row.Headers.IsSome then 
             for r in row.Headers.Value do
                 printfn "value %s" (r)
     printfn "%A" argv
     0 // return an integer exit code

Upvotes: 2

Related Questions