Shabir jan
Shabir jan

Reputation: 2427

CSV Parsing - Swift 4

I am trying to parse a CSV but i am getting some issues. Below is the code i used for parsing CSV:

let fileURL = Bundle.main.url(forResource: "test_application_data - Sheet 1", withExtension: "csv")
let content = try String(contentsOf: fileURL!, encoding: String.Encoding.utf8)
let parsedCSV: [[String]] = content.components(separatedBy: "\n").map{ $0.components(separatedBy: ",")}

And this is the data in the CSV i am parsing :

Item 9,Description 9,image url 
"Item 10 Extra line 1 Extra line 2 Extra line 3",Description 10,image url

So by using above code i get correct response for first row i.e Item 9 but i am getting malformed response for Item 10

How can i correctly parse both rows?

enter image description here

Upvotes: 2

Views: 6291

Answers (3)

Mikhail Yaskou
Mikhail Yaskou

Reputation: 89

I think best option to use TabularData https://developer.apple.com/documentation/tabulardata

if let url = Bundle.main.url(forResource: "Table", withExtension: "csv"),
   let data = try? DataFrame.init(contentsOfCSVFile: url) {
    print(data.rows)
    
    let array: [Model] = data.rows.map { row in
        let value1 = row["ColumnKey1", String.self]
        let value2 = row["ColumnKey2", String.self]
        return Model(value1: value1, value2: value2)
    }
    print(array)
} else {
    print("Error")
}

Upvotes: 0

OOPer
OOPer

Reputation: 47876

The RFC for CSV: Common Format and MIME Type for Comma-Separated Values (CSV) Files(RFC-4180)

Not all CSV data or CSV processors conform to all descriptions of this RFC, but generally, fields enclosed within double-quotes can contain:

  • newlines
  • commas
  • escaped double-quotes ("" represents a single double-quote)

This code is a little bit simplified than RFC-4180, but handles all three cases above:

UPDATE This old code does not handle CRLF well. (Which is a valid newline in RFC-4180.) I added a new code at the bottom, please check it. Thanks to Jay.

import Foundation

let csvText = """
Item 9,Description 9,image url
"Item 10
Extra line 1
Extra line 2
Extra line 3",Description 10,image url
"Item 11
Csv item can contain ""double quote"" and comma(,)", Description 11 ,image url
"""

let pattern = "[ \r\t]*(?:\"((?:[^\"]|\"\")*)\"|([^,\"\\n]*))[ \t]*([,\\n]|$)"
let regex = try! NSRegularExpression(pattern: pattern)

var result: [[String]] = []
var record: [String] = []
let offset: Int = 0
regex.enumerateMatches(in: csvText, options: .anchored, range: NSRange(0..<csvText.utf16.count)) {match, flags, stop in
    guard let match = match else {fatalError()}
    if match.range(at: 1).location != NSNotFound {
        let field = csvText[Range(match.range(at: 1), in: csvText)!].replacingOccurrences(of: "\"\"", with: "\"")
        record.append(field)
    } else if match.range(at: 2).location != NSNotFound {
        let field = csvText[Range(match.range(at: 2), in: csvText)!].trimmingCharacters(in: .whitespaces)
        record.append(field)
    }
    let separator = csvText[Range(match.range(at: 3), in: csvText)!]
    switch separator {
    case "\n": //newline
        result.append(record)
        record = []
    case "": //end of text
        //Ignoring empty last line...
        if record.count > 1 || (record.count == 1 && !record[0].isEmpty) {
            result.append(record)
        }
        stop.pointee = true
    default: //comma
        break
    }
}
print(result)

(Intended to test in a Playground.)


New code, CRLF ready.

import Foundation

let csvText =  "Field0,Field1\r\n"

let pattern = "[ \t]*(?:\"((?:[^\"]|\"\")*)\"|([^,\"\r\\n]*))[ \t]*(,|\r\\n?|\\n|$)"
let regex = try! NSRegularExpression(pattern: pattern)

var result: [[String]] = []
var record: [String] = []
let offset: Int = 0
regex.enumerateMatches(in: csvText, options: .anchored, range: NSRange(0..<csvText.utf16.count)) {match, flags, stop in
    guard let match = match else {fatalError()}
    if let quotedRange = Range(match.range(at: 1), in: csvText) {
        let field = csvText[quotedRange].replacingOccurrences(of: "\"\"", with: "\"")
        record.append(field)
    } else if let range = Range(match.range(at: 2), in: csvText) {
        let field = csvText[range].trimmingCharacters(in: .whitespaces)
        record.append(field)
    }
    let separator = csvText[Range(match.range(at: 3), in: csvText)!]
    switch separator {
    case "": //end of text
        //Ignoring empty last line...
        if record.count > 1 || (record.count == 1 && !record[0].isEmpty) {
            result.append(record)
        }
        stop.pointee = true
    case ",": //comma
        break
    default: //newline
        result.append(record)
        record = []
    }
}
print(result) //->[["Field0", "Field1"]]

Upvotes: 7

FruitAddict
FruitAddict

Reputation: 2032

The problem is with this line of code:

content.components(separatedBy: "\n")

It separates your csv file into rows based on the newline character. There are newline characters in your "Item 10 Extra line 1 Extra line 2 Extra line 3" String so each extra line is getting treated as a different row, so in the end you get the wrong result.

I'd suggest escaping the newline characters in your multiline text column or getting rid of them altogether. You can also modyfy the input file so the newline delimeter isn't \n at the end of each row but something custom (a string that won't appear elsewhere in the csv file).

Upvotes: 0

Related Questions