Reputation: 21
Here I have a dataset that includes header and separator, like this:
a|b|c|d|e|f|g
1|2|3|4|5|6|5
2|4|2|3|5|2|1
And another config file with some columns and values, like this:
b:5
d:6
My aim is to modify the dataset with the config file. The result is like this:
a|b|c|d|e|f|g
1|5|3|6|5|6|5
2|5|2|6|5|2|1
Without using "for" outside awk, how can I complete the process?
Upvotes: 0
Views: 137
Reputation: 74615
Here's how you could do it using awk:
awk '
NR == FNR { rep[$1] = $2; next }
FNR == 1 { for (i = 1; i <= NF; ++i) if ($i in rep) cols[i] = rep[$i] }
FNR > 1 { for (i in cols) $i = cols[i] }
1
' FS=':' replacements FS='|' OFS='|' dataset
key:value
replacements into an array rep
NR == FNR
(total line number equals this file's line number)next
cols
along with their replacements1
(always true), which triggers the default action { print }
Note that since the two files have different separators, these are specified as arguments after the awk script. FS
defines the input Field Separator and OFS
defines the Output Field Separator for the next filename argument. The arguments should be read like:
# read the file 'replacements' with input field separator set to ':'
FS=':' replacements
# read the file 'dataset' with input and output field separator set to '|'
FS='|' OFS='|' dataset
$ cat replacements
b:5
d:6
$ cat dataset
a|b|c|d|e|f|g
1|2|3|4|5|6|5
2|4|2|3|5|2|1
$ awk '
> NR == FNR { rep[$1] = $2; next }
> FNR == 1 { for (i = 1; i <= NF; ++i) if ($i in rep) cols[i] = rep[$i] }
> FNR > 1 { for (i in cols) $i = cols[i] }
> 1
> ' FS=':' replacements FS='|' OFS='|' dataset
a|b|c|d|e|f|g
1|5|3|6|5|6|5
2|5|2|6|5|2|1
Upvotes: 4
Reputation: 11940
It is probably wisest to do in the following order. First, you parse the config (assuming GNU dialect of awk):
gawk -F \| -v OFS=\| 'NR == FNR { # this pattern trigs inside the first file
split($0, mapping, /:/)
rules[mapping[1]] = mapping[2]
next # short-circuit to skip other blocks
}
Next, on the first line of the data file you need to parse column headers:
FNR == 1 {
for(i = 1; i <= NF; ++i) if($i in rules) forcedValues[i] = rules[$i]
print
next
}
Now you have an array forcedValues
that, for some column numbers from 1 through 7 (in your example), contains values those columns should be reset to. So now you process the rest of the file:
{
for(i in forcedValues) $i = forcedValues[i]
print
}' config.txt input.txt > output.txt
(The three code snippets in this post are actually parts of a single shell command and should be concatenated via newlines.)
Upvotes: 1