Reputation: 417
My input file has its content in following format, where each column is separated by a "space"
string1<space>string2<space>string3<space>YYYY-mm-dd<space>hh:mm:ss.SSS<space>string4<space>10:1234567890<space>0e:Apple 1.2.3.4<space><space>string5<space>HEX
There are 2 "spaces" after "0e:Apple 1.2.3.4" because there is no 14th digit in this field/column. The entire "0e:Apple 1.2.3.4space" is treated as a single value of that column.
In the 7th column, 10: represents the count of characters in the following string.
In the 8th column, 0e: represents a hex value of 14. So, the HEX values mention the count of characters in the string that follows.
Like:
"0e:Apple 1.2.3.4 "--> this is the actual value in 8th column without " "
(I've mentioned " " to show that the 14th digit is empty)
It's counted as
0e:A p p l e 1 . 2 . 3 . 4
| | | | | | | | | | | | | |
1 2 3 4 5 6 7 8 9 10 11 12 1314
Let's consider first row from the input file as:
string1 string2 string3 yyyy-mm-dd 23:50:45.999 string4 10:1234567890 0e:Apple 1.2.3.4 string5 001e
where:
string1
is the value in 1st column string2
is the value in 2nd column string3
is the value in 3rd column yyyy-mm-dd
in 4th 23:50:50.999
in 5th string3
in 6th 10:1234567890
in 7th //there is no space at the end because it has 10 digits 0e:Apple 1.2.3.4
in 8th //space at the end string5
in 9th 001e
in 10th Expected output:
string1,string2,string3,yyyy-mm dd,23:50:50.999,string3,1234567890,Apple_1.2.3.4,string5,30
Requirements:
10:
& 0e:
)Apple
and 1.2.3.4
should be replace by "_"I've tried using this:
$ cat input.txt |sed 's/[a-z0-9].*://g'
which gives output as:
string1,string2,string3,yyyy-mm-dd,45.999,string4,1234567890,Apple,1.2.3.4,,string5,001e
Upvotes: 4
Views: 925
Reputation: 5603
This will do what you want on your example input:
awk -F "[ ]" '{sub(/.*:/, "", $7) sub(/.*:/, "", $8); printf "%s,%s,%s,%s,%s,%s,%s,%s_%s,%s,%s,%d\n", $1, $2, $3, $4, $5, $6, $7, $8, $9, $10, $11, "0x"$12}' input.txt
Explanation of parts:
awk
printf
allows you to specify an output format, so you can manually specify which fields you want to delimit with ,
and which you want to delimit with _
.
-F "[ ]"
forces the field separator to be a single space so that it knows there is an empty field between two single spaces. The default behavior would be to allow multiple spaces to be a single delimiter, which is not what you want according to the question.
The sub
function allows you to do regular expression replacement, in this case removing the ..:
prefix in fields 7 and 8.
For field 12, we tell printf
to output as a number (%d
) and give as input the string in prefixed by 0x
so that it interprets it as hexadecimal.
Note: If it's not always the case that you want the output to be $8_$9
, then you actually need to parse the hexadecimal prefix and count off characters in order to determine where the field ends. If that's the case, I would personally prefer to write the whole thing in something else, e.g. Python.
Upvotes: 2