m1nkeh
m1nkeh

Reputation: 1397

U-SQL Text Extractor

I've got a web log file that I'm working with in U-SQL with a query similar to:

@x =
    EXTRACT Col1 string, UserAgent string, Col2 string
    FROM "/file"
    USING Extractors.Text(delimiter : ' ');

sometimes though, the UserAgent contains something along the lines of:-

Android Tablet 10" blah blah

which invariably means the script thinks that line has four columns instead of three..

anyone have any bright ideas how i can deal with this.. i'm not sure if it's possible to escape that char or somehow ignore it upon extraction?

Upvotes: 2

Views: 2196

Answers (3)

Shaun Ryan
Shaun Ryan

Reputation: 1718

Use data factory to prepare a copy the data and insert an escape character (simple Jason setting). This will easily parse escape characters into your data. You can then use the escapeCharacter parameter. It doesn't matter what you choose because the escape character escapes itself but it's better to chose something obscure.

You have to pre-parse your data somehow to either insert an escape character or to escape quotes with a ". Personally I prefer the escape character and it's really easy to do with data factory.

Upvotes: 0

benjguin
benjguin

Reputation: 1516

Per https://msdn.microsoft.com/en-us/library/azure/mt764098.aspx, I would try

@x =
    EXTRACT Col1 string, UserAgent string, Col2 string
    FROM "/file"
    USING Extractors.Text(delimiter : ' ', quoting:false);

Upvotes: 0

Michael Rys
Michael Rys

Reputation: 6684

Either you have to use a delimiter that is not appearing in the text, or make sure that the delimiter is escaped, or you use quoting.

If none of these work, you could get the whole line into a single row and then process the row, or write a custom extractor that will move superfluous data into an overflow column.

Upvotes: 3

Related Questions