Reputation: 2427
I am trying to parse a CSV but i am getting some issues. Below is the code i used for parsing CSV:
let fileURL = Bundle.main.url(forResource: "test_application_data - Sheet 1", withExtension: "csv")
let content = try String(contentsOf: fileURL!, encoding: String.Encoding.utf8)
let parsedCSV: [[String]] = content.components(separatedBy: "\n").map{ $0.components(separatedBy: ",")}
And this is the data in the CSV i am parsing :
Item 9,Description 9,image url
"Item 10 Extra line 1 Extra line 2 Extra line 3",Description 10,image url
So by using above code i get correct response for first row i.e Item 9
but i am getting malformed response for Item 10
How can i correctly parse both rows?
Upvotes: 2
Views: 6291
Reputation: 89
I think best option to use TabularData https://developer.apple.com/documentation/tabulardata
if let url = Bundle.main.url(forResource: "Table", withExtension: "csv"),
let data = try? DataFrame.init(contentsOfCSVFile: url) {
print(data.rows)
let array: [Model] = data.rows.map { row in
let value1 = row["ColumnKey1", String.self]
let value2 = row["ColumnKey2", String.self]
return Model(value1: value1, value2: value2)
}
print(array)
} else {
print("Error")
}
Upvotes: 0
Reputation: 47876
The RFC for CSV: Common Format and MIME Type for Comma-Separated Values (CSV) Files(RFC-4180)
Not all CSV data or CSV processors conform to all descriptions of this RFC, but generally, fields enclosed within double-quotes can contain:
""
represents a single double-quote)This code is a little bit simplified than RFC-4180, but handles all three cases above:
UPDATE This old code does not handle CRLF well. (Which is a valid newline in RFC-4180.) I added a new code at the bottom, please check it. Thanks to Jay.
import Foundation
let csvText = """
Item 9,Description 9,image url
"Item 10
Extra line 1
Extra line 2
Extra line 3",Description 10,image url
"Item 11
Csv item can contain ""double quote"" and comma(,)", Description 11 ,image url
"""
let pattern = "[ \r\t]*(?:\"((?:[^\"]|\"\")*)\"|([^,\"\\n]*))[ \t]*([,\\n]|$)"
let regex = try! NSRegularExpression(pattern: pattern)
var result: [[String]] = []
var record: [String] = []
let offset: Int = 0
regex.enumerateMatches(in: csvText, options: .anchored, range: NSRange(0..<csvText.utf16.count)) {match, flags, stop in
guard let match = match else {fatalError()}
if match.range(at: 1).location != NSNotFound {
let field = csvText[Range(match.range(at: 1), in: csvText)!].replacingOccurrences(of: "\"\"", with: "\"")
record.append(field)
} else if match.range(at: 2).location != NSNotFound {
let field = csvText[Range(match.range(at: 2), in: csvText)!].trimmingCharacters(in: .whitespaces)
record.append(field)
}
let separator = csvText[Range(match.range(at: 3), in: csvText)!]
switch separator {
case "\n": //newline
result.append(record)
record = []
case "": //end of text
//Ignoring empty last line...
if record.count > 1 || (record.count == 1 && !record[0].isEmpty) {
result.append(record)
}
stop.pointee = true
default: //comma
break
}
}
print(result)
(Intended to test in a Playground.)
New code, CRLF ready.
import Foundation
let csvText = "Field0,Field1\r\n"
let pattern = "[ \t]*(?:\"((?:[^\"]|\"\")*)\"|([^,\"\r\\n]*))[ \t]*(,|\r\\n?|\\n|$)"
let regex = try! NSRegularExpression(pattern: pattern)
var result: [[String]] = []
var record: [String] = []
let offset: Int = 0
regex.enumerateMatches(in: csvText, options: .anchored, range: NSRange(0..<csvText.utf16.count)) {match, flags, stop in
guard let match = match else {fatalError()}
if let quotedRange = Range(match.range(at: 1), in: csvText) {
let field = csvText[quotedRange].replacingOccurrences(of: "\"\"", with: "\"")
record.append(field)
} else if let range = Range(match.range(at: 2), in: csvText) {
let field = csvText[range].trimmingCharacters(in: .whitespaces)
record.append(field)
}
let separator = csvText[Range(match.range(at: 3), in: csvText)!]
switch separator {
case "": //end of text
//Ignoring empty last line...
if record.count > 1 || (record.count == 1 && !record[0].isEmpty) {
result.append(record)
}
stop.pointee = true
case ",": //comma
break
default: //newline
result.append(record)
record = []
}
}
print(result) //->[["Field0", "Field1"]]
Upvotes: 7
Reputation: 2032
The problem is with this line of code:
content.components(separatedBy: "\n")
It separates your csv file into rows based on the newline character. There are newline characters in your "Item 10 Extra line 1 Extra line 2 Extra line 3"
String so each extra line is getting treated as a different row, so in the end you get the wrong result.
I'd suggest escaping the newline characters in your multiline text column or getting rid of them altogether. You can also modyfy the input file so the newline delimeter isn't \n
at the end of each row but something custom (a string that won't appear elsewhere in the csv file).
Upvotes: 0