Reputation: 1
I am currently attempting to read a large csv file (2.05GB) into Power BI's dataflow. The csv file has 5 million rows and 38 columns (as read separately in Jupyter notebook), and there are some cells which contain commas, and are escaped by \,
In order to read it into dataflow via M, I have attempted to replace the \,
into "<Comma>
", read the csv, then convert it back to a comma within the cell.
My code is as follows:
let
SourceBinary = Web.Contents("https//link.csv"),
SourceText = Text.FromBinary(SourceBinary, TextEncoding.Utf8),
CleanText = Text.Replace(SourceText, "\\,", "<COMMA>"),
CleanBinary = Text.ToBinary(CleanText, TextEncoding.Utf8),
ParsedCSV = Csv.Document(CleanBinary, [Delimiter=",", Columns=38, Encoding=65001, QuoteStyle=QuoteStyle.None])
in
ParsedCSV
However, I am still getting the error DataFormat.Error: The number of items in the list is too large.
I believe this is because the binary string read from the csv is too large.
Are there any ways I can get around this? Can I either read the csv directly while ignoring the wrong delimiters, or perhaps circumvent reading the csv as a binary string altogether?
Thank you!
Upvotes: 0
Views: 18