Reputation: 23492
I am emulating my issue in below test file:
# cat out
2014-01-10 18:23:25 0 Andy/ADPTER/
2014-01-10 18:23:36 503 Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 Jim/ADPTER/UNITS MAP.csv
This is my Bash variable:
# echo $bucket
bucket_name
So, in above file, I want the Bash variable value be prefixed to the 4th Field.
This is my desired output:
2014-01-10 18:23:25 0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36 503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 bucket_name/Jim/ADPTER/UNITS MAP.csv
This is what I have tried:
# awk -v var=$bucket '{$4=var"/"$4; print}' out
2014-01-10 18:23:25 0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36 503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 bucket_name/Jim/ADPTER/UNITS MAP.csv
Question:
My awk
command does what I need, however, it messes up the outfield spacing (separator ??). My Intention is to just prefix bucket_name/
to 4th field and maintain whatever spacing scheme (including right/left justified fields) the input file has.
This is my another attempt:
# awk -v var=$bucket 'BEGIN{OFS="\t"}{$4=var"/"$4; print}' out
2014-01-10 18:23:25 0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36 503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 bucket_name/Jim/ADPTER/UNITS MAP.csv
But it's not helping either.
Thanks.
Upvotes: 3
Views: 169
Reputation: 784918
You can use this awk
:
bucket="bucket_name"
awk --re-interval -v b="$bucket" '{sub(/([^[:blank:]]+[[:blank:]]+){3}/,
"&" b "/")} 1' file
2014-01-10 18:23:25 0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36 503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 bucket_name/Jim/ADPTER/UNITS MAP.csv
-v b="$bucket" # pass a value to awk in variable b
--re-interval # Enable the use of interval
# expressions in regular expression matching
sub # match input using regex and substitute with
# the given string
([^[:blank:]]+[[:blank:]]+){3} # match first 3 fields of the line separated by space/tab
"&" b "/" # replace by matched string + var b + /
EDIT: (Thanks to @EdMorton) To make it work with any value in argument (e.g. try both solutions if bucket="&"
) use:
awk --re-interval -v b="$bucket" 'match($0, /([^[:blank:]]+[[:blank:]]+){3}/) {
$0 = substr($0, 1, RLENGTH) b "/" substr($0, RLENGTH+1) } 1' file
Upvotes: 2
Reputation: 44023
This is a bit tricky to do in awk, but there is a relevant GNU extension: In gawk, the split
function takes an optional fourth parameter to save the actual field delimiters for later use. Using that:
gawk -v bucket="$bucket" '{ split($0, f, FS, d); d[NF] = ORS; f[4] = bucket "/" f[4]; for(i = 1; i <= NF; ++i) printf("%s%s", f[i], d[i]); }' filename
That is:
{
split($0, f, FS, d) # split line into fields, saving fields in
# the f and delimiters in the d array
d[NF] = ORS # for the newline at the end
f[4] = bucket "/" f[4] # fix fourth field
for(i = 1; i <= NF; ++i) { # then print the fields separated by the
printf("%s%s", f[i], d[i]); # saved delimiters
}
}
Addendum: I cannot really recommend doing this with sed unless the variable comes from a trustworthy source and is guaranteed to not contain metacharacters (otherwise you will have code injection problems). That said: a simple way with sed is
sed "s|[[:space:]]\+|&${bucket}/|3" filename
...which appends ${bucket}
to the third occurrence of [[:space:]]\+
.
Upvotes: 1
Reputation: 212198
If you're going to insist on awk, it might be simplest to explicitly give a format string:
awk '{printf "%s %s %10s %s/%s\n", $1, $2, $3, b, $4}' b="$bucket" out
Upvotes: 1
Reputation: 174696
You could use sed.
$ bucket='bucket_name'
$ sed "s~^\(\([^[:blank:]]\+[[:blank:]]\+\)\{3\}\)~\1$bucket/~" file
2014-01-10 18:23:25 0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36 503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 bucket_name/Jim/ADPTER/UNITS MAP.csv
[[:blank:]]\+
posix character class which matches any type of horizontal white-space character, one or more times. [^[:blank:]]\+
POSIX negated character class which matches any character but not of an whitespace one or more times.
Upvotes: 2
Reputation: 26121
You have tagged Perl in OP so there is a Perl solution:
perl -pe'BEGIN{$var=shift}s,(?:.*?\s+){3}\K,$var/,' "$bucket" out
It is technically same solution as the solution using sed
but with the benefit it avoids escaping problems. Shell variable $bucket
can contain anything.
Upvotes: 3