user2884505
user2884505

Reputation: 555

Excel - Query for CSV missing columns

So I'm using Office 2016's new Query feature to get data from various CSV API endpoints. I managed to get it to work with Yahoo Finance's but I can't seem to get Morningstar's to work. When I visit this link in my browser, I download a full CSV with all the proper data:

Link: http://financials.morningstar.com/ajax/ReportProcess4CSV.html?t=MSFT&reportType=is&period=12&dataType=A&order=asc&columnYear=10&number=3

However, when I use the query feature, it only loads the first row. I'm guessing it's because only the first column in the first row in the CSV has data, causing it to ignore the rest of the columns. Does anyone know how to ignore that first row and begin importing the CSV from the second?

let
    Source = Csv.Document(Web.Contents("http://financials.morningstar.com/ajax/ReportProcess4CSV.html?t=MSFT&reportType=is&period=12&dataType=A&order=asc&columnYear=10&number=3"),[Delimiter=",",Encoding=1252])
in
    Source

Thanks in advance!

Edit: In Csv.Document, it looks like there's an extraValues parameter; maybe that might help?

Upvotes: 1

Views: 1495

Answers (3)

The extra values parameter can retrieve your extra rows (using ExtraValues.List), but the shape of the table will be difficult to work with.

Here's one way to get the values:

let
    Source = Table.FromColumns({Lines.FromBinary((Web.Contents("http://financials.morningstar.com/ajax/ReportProcess4CSV.html?t=MSFT&reportType=is&period=12&dataType=A&order=asc&columnYear=10&number=3")),null,null,1252)}),
    #"Removed Top Rows" = Table.Skip(Source,1),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Removed Top Rows","Column1",Splitter.SplitTextByDelimiter(",", QuoteStyle.Csv),{"Column1.1", "Column1.2", "Column1.3", "Column1.4", "Column1.5", "Column1.6", "Column1.7"}),
    #"Changed Type" = Table.TransformColumnTypes(#"Split Column by Delimiter",{{"Column1.1", type text}, {"Column1.2", type text}, {"Column1.3", type text}, {"Column1.4", type text}, {"Column1.5", type text}, {"Column1.6", type text}, {"Column1.7", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type")
in
    #"Promoted Headers"

This query treats the CSV file as a text document of lines. It removes the first line, which is some header statement that causes the parser to think that the CSV file has only one column. It then splits the lines into columns based on comma placement and then makes the first row the column names.

Upvotes: 2

Eugene
Eugene

Reputation: 1264

Unless you explicitly specify number of columns, it is determined using the first row. As the first row of your example contains values in the first column only, resulting table also contains only one.

This code returns 7 columns:

let
    Source = Web.Contents("http://financials.morningstar.com/ajax/ReportProcess4CSV.html?t=MSFT&reportType=is&period=12&dataType=A&order=asc&columnYear=10&number=3"),
    Csv = Csv.Document(Source,[Delimiter=",", Columns=7, Encoding=1252, QuoteStyle=QuoteStyle.None])
in
    Csv

Upvotes: 0

Paul Ogilvie
Paul Ogilvie

Reputation: 25286

see https://msdn.microsoft.com/en-us/library/mt260840.aspx?f=255&MSPPError=-2147217396:

let

Source = Csv.Document(File.Contents("C:\Projects\Examples\SalesForceContacts.txt"),
    [Delimiter=",",Encoding=1252]),
    #"First Row as Header" = Table.PromoteHeaders(Source),
    ...

But if the header row is corrupt, this may not work.

Upvotes: 0

Related Questions