toldmimsy
toldmimsy

Reputation: 1

KDB Q Ingestion

currently presented with data can ingest a csv file..

Currently data is all in one column but need to split the data below into sperate columns currently a space indicates its a new column and char count for each col may be a good way to distinguish separation? tried using sv/ssr but getting errors. Will also need to account for sometimes columns being empty.

 (`$( ssr[;" ";""] each x . (::;0)))

Current 1 row per ""

"EEEE 00000Z  AUTO 0000KT 1234 SA  0VC000    0504   Q111 COL  RE"
"EAAA 00000Z  AUTO 0000KT      RA  0VC000           Q111 COL  RE"

any pointers would be helpful

Unsure if ssr is the right operator or if sv would be better in this case?

(`$( ssr[;" ";""] each x . (::;0))) 

Desired output is

Col1 Col2    Col3 Col4   Col5 Col6 Col7     Col8   Col9  Col10  Col11
EEEE 00000Z  AUTO 0000KT 1234 SA  OVC000    0504   Q111   COL    RE
EAAA 00000Z  AUTO 0000KT      RA  0VC000            Q111  COL    RE

Upvotes: 0

Views: 94

Answers (1)

Thomas Smyth
Thomas Smyth

Reputation: 5654

It looks like you are dealing with a file that contains fixed widths. The operator 0: can interpret these fixed-format files.

The syntax is:

(types; widths) 0: text

For your use case, parsing all columns as strings:

("***********";4 7 6 7 5 3 8 8 7 4 4)0:txt
"EEEE"     "EAAA"
" 00000Z"  " 00000Z"
"  AUTO"   "  AUTO"
" 0000KT"  " 0000KT"
" 1234"    "     "
" SA"      " RA"
"  0VC000" "  0VC000"
"    0504" "        "
"   Q111"  "   Q111"
" COL"     " COL"
"  RE"     "  RE"

If you need to remove any leading or trailing whitespace from the columns you can use trim. In this case I have assumed some of the column types.

/ add column names
q)tab:{flip(`$"col",/:string 1+til count[x])!x}("S*S*JSS*SSS";4 7 6 7 5 3 8 8 7 4 4)0:txt

/ trim string columns
q)@[tab;exec c from meta[tab]where t="C";trim]
col1 col2     col3 col4     col5 col6 col7   col8   col9 col10 col11
--------------------------------------------------------------------
EEEE "00000Z" AUTO "0000KT" 1234 SA   0VC000 "0504" Q111 COL   RE
EAAA "00000Z" AUTO "0000KT"      RA   0VC000 ""     Q111 COL   RE

It may also be useful to know you could drop single (or multiple) spaces in this manner:

("* * * * * * * * * * *";4 1 6 1 5 1 6 1 4 1 2 1 7 1 7 1 6 1 3 1 3)0:txt
"EEEE"    "EAAA"
"00000Z"  "00000Z"
" AUTO"   " AUTO"
"0000KT"  "0000KT"
"1234"    "    "
"SA"      "RA"
" 0VC000" " 0VC000"
"   0504" "       "
"  Q111"  "  Q111"
"COL"     "COL"
" RE"     " RE"

Upvotes: 3

Related Questions