Reputation: 1325
I have two files as follows. The first is sample.txt
:
new haven co-op toronto on $1245
joe schmo co-op powell river bc $4444
The second is locations.txt
:
toronto
powell river
on
bc
We'd like to use sed
to produce a marked up sample-new.txt
that added ;
before and after each of these. So that the final string would appear like:
new haven co-op ;toronto; ;on; $1245
joe schmo co-op ;powell river; ;bc; $4444
Is this possible using bash? The actual files are much longer (thousands of lines in each case) but as a one-time job we're not too concerned about processing time.
--- edited to add ---
My original approach was something like this:
cat locations.txt | xargs -i sed 's/{}/;/' sample.txt
But it only ran the script once per pattern, as opposed to the methods you've proposed here.
Upvotes: 3
Views: 65
Reputation: 20980
Using awk
:
awk 'NR==FNR{a[NR]=$0; next;} {for(i in a)gsub("\\<"a[i]"\\>",";"a[i]";"); print} ' locations.txt sample.txt
Using awk+sed
sed -f <(awk '{print "s|\\<"$0"\\>|;"$0";|g"}' locations.txt) sample.txt
Same using pure sed
:
sed -f <(sed 's/.*/s|\\<&\\>|\;&\;|g/' locations.txt) sample.txt
(After you show your coding attempts, I will add the explanation of why this works.)
Upvotes: 2
Reputation: 46896
Just to complete your set of options, you can do this in pure bash, slowly:
#!/usr/bin/env bash
readarray -t places < t2
while read line; do
for place in "${places[@]}"; do
line="${line/ $place / ;$place; }"
done
echo "$line"
done < t1
Note that this likely won't work as expected if you include places that are inside other places, for example "niagara on the lake" which is in "on":
foo bar co-op ;niagara ;on; the lake; on $1
Instead, you might want to do more targeted pattern matching, which will be much easier in awk:
#!/usr/bin/awk -f
# Collect the location list into the index of an array
NR==FNR {
places[$0]
next
}
# Now step through the input file
{
# Handle two-letter provinces
if ($(NF-1) in places) {
$(NF-1)=";" $(NF-1) ";"
}
# Step through the remaining places doing substitutions as we find matches
for (place in places) {
if (length(place)>2 && index($0,place)) {
sub(place,";"place";")
}
}
}
# Print every line
1
This works for me using the data in your question:
$ cat places
toronto
powell river
niagara on the lake
on
bc
$ ./tst places input
new haven co-op ;toronto; ;on; $1245
joe schmo co-op ;powell river; ;bc; $4444
foo nar co-op ;niagara on the lake; ;on; $1
You may have a problem if your places file contains an actual non-province comprising two letters. I'm not sure if such things exist in Canada, but if they do, you'll either need to tweak such lines manually, or make the script more complex by handling provinces separately from cities.
Upvotes: 1