slayedbylucifer
slayedbylucifer

Reputation: 23492

AWK: Maintain field spacing like input file

I am emulating my issue in below test file:

# cat out 
2014-01-10 18:23:25          0 Andy/ADPTER/
2014-01-10 18:23:36        503 Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38        516 John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38        398 Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38      11117 Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38        260 Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39        466 John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40        373 Jim/ADPTER/UNITS MAP.csv

This is my Bash variable:

# echo $bucket
bucket_name

So, in above file, I want the Bash variable value be prefixed to the 4th Field.

This is my desired output:

2014-01-10 18:23:25          0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36        503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38        516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38        398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38      11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38        260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39        466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40        373 bucket_name/Jim/ADPTER/UNITS MAP.csv

This is what I have tried:

# awk -v var=$bucket '{$4=var"/"$4; print}' out 
2014-01-10 18:23:25 0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36 503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38 516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38 398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38 11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38 260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39 466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40 373 bucket_name/Jim/ADPTER/UNITS MAP.csv

Question:

My awk command does what I need, however, it messes up the outfield spacing (separator ??). My Intention is to just prefix bucket_name/ to 4th field and maintain whatever spacing scheme (including right/left justified fields) the input file has.

This is my another attempt:

# awk -v var=$bucket 'BEGIN{OFS="\t"}{$4=var"/"$4; print}' out 
2014-01-10  18:23:25    0   bucket_name/Andy/ADPTER/
2014-01-10  18:23:36    503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE    MAP.csv
2014-01-10  18:23:38    516 bucket_name/John/ADPTER/CITY    MAP.csv
2014-01-10  18:23:38    398 bucket_name/Wendy/ADPTER/COUNTRY    MAP.csv
2014-01-10  18:23:38    11117   bucket_name/Andy/ADPTER/CURRENCY    MAP.csv
2014-01-10  18:23:38    260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10  18:23:39    466 bucket_name/John/ADPTER/STATE   MAP.csv
2014-01-10  18:23:40    373 bucket_name/Jim/ADPTER/UNITS    MAP.csv

But it's not helping either.

Thanks.

Upvotes: 3

Views: 169

Answers (5)

anubhava
anubhava

Reputation: 784918

You can use this awk:

bucket="bucket_name"
awk --re-interval -v b="$bucket" '{sub(/([^[:blank:]]+[[:blank:]]+){3}/, 
     "&" b "/")} 1' file
2014-01-10 18:23:25          0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36        503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38        516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38        398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38      11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38        260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39        466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40        373 bucket_name/Jim/ADPTER/UNITS MAP.csv

Online Working Demo

-v b="$bucket"                 # pass a value to awk in variable b
--re-interval                  # Enable the use of interval
                               # expressions in regular expression matching
sub                            # match input using regex and substitute with
                               # the given string
([^[:blank:]]+[[:blank:]]+){3} # match first 3 fields of the line separated by space/tab
 "&" b "/"                     # replace by matched string + var b + /

EDIT: (Thanks to @EdMorton) To make it work with any value in argument (e.g. try both solutions if bucket="&") use:

awk --re-interval -v b="$bucket" 'match($0, /([^[:blank:]]+[[:blank:]]+){3}/) {
    $0 = substr($0, 1, RLENGTH) b "/" substr($0, RLENGTH+1) } 1' file

Upvotes: 2

Wintermute
Wintermute

Reputation: 44023

This is a bit tricky to do in awk, but there is a relevant GNU extension: In gawk, the split function takes an optional fourth parameter to save the actual field delimiters for later use. Using that:

gawk -v bucket="$bucket" '{ split($0, f, FS, d); d[NF] = ORS; f[4] = bucket "/" f[4]; for(i = 1; i <= NF; ++i) printf("%s%s", f[i], d[i]); }' filename

That is:

{
  split($0, f, FS, d)             # split line into fields, saving fields in
                                  # the f and delimiters in the d array
  d[NF] = ORS                     # for the newline at the end
  f[4] = bucket "/" f[4]          # fix fourth field
  for(i = 1; i <= NF; ++i) {      # then print the fields separated by the
    printf("%s%s", f[i], d[i]);   # saved delimiters
  }
}

Addendum: I cannot really recommend doing this with sed unless the variable comes from a trustworthy source and is guaranteed to not contain metacharacters (otherwise you will have code injection problems). That said: a simple way with sed is

sed "s|[[:space:]]\+|&${bucket}/|3" filename

...which appends ${bucket} to the third occurrence of [[:space:]]\+.

Upvotes: 1

William Pursell
William Pursell

Reputation: 212198

If you're going to insist on awk, it might be simplest to explicitly give a format string:

awk '{printf "%s %s %10s %s/%s\n", $1, $2, $3, b, $4}' b="$bucket" out

Upvotes: 1

Avinash Raj
Avinash Raj

Reputation: 174696

You could use sed.

$ bucket='bucket_name'
$ sed "s~^\(\([^[:blank:]]\+[[:blank:]]\+\)\{3\}\)~\1$bucket/~" file
2014-01-10 18:23:25          0 bucket_name/Andy/ADPTER/
2014-01-10 18:23:36        503 bucket_name/Sandy/ADPTER/ACCOUNTTYPE MAP.csv
2014-01-10 18:23:38        516 bucket_name/John/ADPTER/CITY MAP.csv
2014-01-10 18:23:38        398 bucket_name/Wendy/ADPTER/COUNTRY MAP.csv
2014-01-10 18:23:38      11117 bucket_name/Andy/ADPTER/CURRENCY MAP.csv
2014-01-10 18:23:38        260 bucket_name/Sandy/ADPTER/GENDER MAP.csv
2014-01-10 18:23:39        466 bucket_name/John/ADPTER/STATE MAP.csv
2014-01-10 18:23:40        373 bucket_name/Jim/ADPTER/UNITS MAP.csv

[[:blank:]]\+ posix character class which matches any type of horizontal white-space character, one or more times. [^[:blank:]]\+ POSIX negated character class which matches any character but not of an whitespace one or more times.

Upvotes: 2

Hynek -Pichi- Vychodil
Hynek -Pichi- Vychodil

Reputation: 26121

You have tagged Perl in OP so there is a Perl solution:

perl -pe'BEGIN{$var=shift}s,(?:.*?\s+){3}\K,$var/,' "$bucket" out

It is technically same solution as the solution using sed but with the benefit it avoids escaping problems. Shell variable $bucket can contain anything.

Upvotes: 3

Related Questions