Reputation: 1820
I have two input files:
File1.txt:
Name Latin-small Roman Latin-caps #header, not to be processed
F0, a, I, A
F1, b, II, B
F2, c, III, C
F3, d, IV, D
File2.txt:
Lorem ipsum
Roman here.
LCaps here.
LSmall here.
Lorem ipsum
R
, LC
and LS
from each line of File1.txt
(line 6 of script.sh
).Fx
, where x=0, 1, 2, 3,... using File1.txt
(line 7 of script.sh
).Fx.txt
, generated using File2.txt
has to be placed in those folder (line 7 of script.sh
).File1.txt
, it should read (line 7 of script.sh
) & modify the whole File2.txt
looking at the keys. <- this is where I cannot make it work, it reads one line of File2.txt
for each line of File1.txt
. File.txt
, except the values of here
, modified using the keys Roman
($3 of File1.txt
), LCaps
($4 of File1.txt
) and LSmall
($2 of File1.txt
) for each Fx.txt
in each directory, using the values assigned in the first step from File1.txt
(line 9-17 of script.sh
).How to get the following output in respective folders (e.g. the output file in Folder F2
), using awk:
cat F0/F0.txt
Lorem ipsum
Roman I.
LCaps A.
LSmall a.
Lorem ipsum
or,
cat F3/F3.txt
Lorem ipsum
Roman IV.
LCaps D.
LSmall d.
Lorem ipsum
or,
cat F2/F2.txt
Lorem ipsum
Roman III.
LCaps C.
LSmall c.
Lorem ipsum
More info: File1
is ~300lines, for each line (except the header), one file is to be created in each folder. File2
is ~200lines. Each of the phrases Roman
or LSmall
or LC
randomly occur in certain lines of File2.txt
, but not more than one in one line. These are the keys for modyfying values in `
Thanks in advance! This question is a part of a bigger workflow.
EDIT2: trial code
script.sh
awk 'BEGIN {FS=","}
{
if ($1 !~ "F")
{}
else if ($1 ~ "F")
{LS = $2; R = $3; LC = $4;
system("mkdir "$1); filename=$1"/"$1".txt";
{(getline < "File2.txt");
{
if ($0 ~ "Roman")
{gsub("here",R); print >> filename;}
else if ($0 ~ "LSmall")
{gsub("here",LS); print >> filename;}
else if ($0 ~ "LCaps")
{gsub("here",LC); print >> filename;}
else
{print >> filename;}
}
}
}
}
' File1.txt
I'm getting folder and file structure as I need (file Fx.txt
in Folder Fx
, where x = 0, 1, 2, ...), but content of these files are:
cat F0/F0.txt
Lorem ipsum
cat F1/F1.txt
Roman II.
cat F2/F2.txt
LCaps C.
cat F3/F3.txt
LSmall d.
The key is to make awk
read the entire file2.txt
, while reading each line of file1
and making modifications and placing the new files in respective folders.
Upvotes: 0
Views: 1184
Reputation: 189317
Like you discovered, Awk can really only process one line at a time. But we can turn things around and read the input file into memory, then loop over its lines repeatedly as we read the other file.
Your example has a comma and a space between the items in file1.txt
but I assumed this is not a hard requirement, and so this script expects tab-delimited input instead.
awk -F "\t" 'BEGIN { split(":LSmall:Roman:LCaps", k, /:/) }
NR==FNR { a[NR] = $0; n=NR; next }
FNR==1 { next } # skip header
{
system("mkdir "$1)
filename=$1"/"$1".txt"
for(i=1; i<=n; i++) {
line = a[i]
for (j=2; j<=NF; ++j) {
if (line ~ k[j]) {
gsub(/here/, $j, line)
break
}
}
print line >>filename }
}' file2.txt file1.txt
The BEGIN
block initializes an array with substitution key names k
. To keep it in sync with the fields in file1.txt
, the first item k[1]
is empty (it doesn't specify a substitution key).
When NR==FNR
we are reading the first input file. We simply collect its lines into the array a
.
When we fall through, we are reading the second file, which is the mapping with directory names and substitutions. For each input line, we loop over all the lines in a
and perform any substitution specified in the fields in the current line (as soon as one is found, we consider ourselves done. Maybe you want to change this so that multiple keys can trigger on the same line) and finally print the result to the specified output file.
You'll notice how we pull the first field and loop over the subsequent fields, looking up their corresponding key in k
by index.
Demo: https://ideone.com/syTv99
If you want to do this on hundreds of files, perhaps refactor some or all of the surrounding loop out into a shell script and concentrate on the substitution actions in the Awk script. The shell can easily loop over the data in file1.txt
just as well, which will simplify the Awk script somewhat and make the overall process easier to understand.
# Trim the obnoxious header
tail -n +2 file1.txt |
while read -r directory LSmall Roman LCaps; do
mkdir "$directory"
awk -v LSmall="$LSmall" -v Roman="$Roman" -v LCaps="$LCaps" '
BEGIN { split("LSmall:Roman:LCaps", k, /:/)
split(LSmall ":" Roman ":" LCaps, r, /:/) }
{
for (j=1; j<=3; ++j)
if ($0 ~ k[j]) {
gsub(/here/, r[j])
break
}
}1' file2.txt >"$directory"/"$directory".txt
done
Demo: https://ideone.com/RUhsUS
Upvotes: 1