Reputation: 65
I have an awk code to split a file by columns and print out the output to a new file name.
awk -F"|" 'NR==1 {h=substr($0, index($0,$5)); next}
{file= path ""$1""$2"_"$3"_"$4"_03042017.csv"; print (a[file]++?"": "DM9 03042017" ORS h ORS) substr($0, index($0,$5)) > file}
END{for(file in a) print "EOF " a[file] > file}'
As I use substr ($0, index($0,$5) so the new output will only have data start at fifth column and the rest. It works fine except when the input data I have got the same value.
For example,
product | ID | Branch | Office | Type | ....
ABC | 12 | KL | GH | Z | ....
For the above example, the code works well as the data input is different.
product | ID | Branch | Office | Type | ....
ABC | 12 | KK | KK | Z | ....
But if I have data input like second example, I have the same value data for third and fourth columns, the code doesn't work well. Instead of getting output start and fifth column and more, I got the result at third column and more.
So, I suspect because as the data input for third and fourth are the same, so it stopped at third line as I used substr.
Is anyone can help me on this matter? Sorry for the long post and appreciate it a lot if you guys can give me some ideas. Thank you.
Upvotes: 0
Views: 127
Reputation: 10039
if structure is fixed like your sample (fixed length field)
awk -F '[[:blank:]]*[|][[:blank:]]*' -v path="./" '
NR==1 {
for( i=1;i<5;i++) $i = ""
h = $0; sub(/^[[:blank:]|]+/,"", h)
next
}
{
file= path $1 $2 "_" $3 "_" $4 "_03042017.csv"
# remove 4 first field
for( i=1;i<5;i++) $i = ""
# cleaning starting space
Cleaned = $0; sub( /^[[:blank:]|]+/, "", Cleaned)
print ( a[file]++ ? "" : "DM9 03042017" ORS h ORS ) Cleaned > file
}
END {
for(file in a) { print "EOF " a[file] > file }
}
' YourFile
Upvotes: 1