Reputation: 1726
We have a source file ("source-A") that looks like this (if you see blue text, it comes from stackoverflow, not the text file):
The container of white spirit was made of aluminium.
We will use an aromatic method to analyse properties of white spirit.
No one drank white spirit at stag night.
Many people think that a potato crisp is savoury, but some would rather eat mashed potato.
...
more sentences
Each sentence in "source-A" is on its own line and terminates with a newline (\n)
We have a dictionary/conversion file ("converse-B") that looks like this:
aluminium<tab>aluminum
analyse<tab>analyze
white spirit<tab>mineral spirits
stag night<tab>bachelor party
savoury<tab>savory
potato crisp<tab>potato chip
mashed potato<tab>mashed potatoes
"converse-B" is a two column, tab delimited file.
Each equivalence map (term-on-left<tab>
term-on-right) is on its own line and terminates with a newline (\n)
How to read "converse-B", and replace terms in "source-A" where a term in "converse-B" column-1 is replaced with the term in column-2, and then write to an output file ("output-C")?
For example, the "output-C" would look like this:
The container of mineral spirits was made of aluminum.
We will use an aromatic method to analyze properties of mineral spirits.
No one drank mineral spirits at bachelor party.
Many people think that a potato chip is savory, but some would rather eat mashed potatoes.
The tricky part is the term potato.
If a "simple" awk
solution cannot handle a singular term (potato) and a plural term (potatoes), we'll use a manual substitution method. The awk
solution can skip that use case.
In other words, an awk
solution can stipulate that it only works for an unambiguous word or a term composed of space separated, unambiguous words.
An awk
solution will get us to a 90% completion rate; we'll do the remaining 10% manually.
Upvotes: 0
Views: 548
Reputation: 67507
sed
probably suits better since since it's only phrase/word replacements. Note that if the same words appear in multiple phrases first come first serve; so change your dictionary order accordingly.
$ sed -f <(sed -E 's_(.+)\t(.+)_s/\1/\2/g_' dict) content
The container of mineral spirits was made of aluminum.
We will use an aromatic method to analyze properties of mineral spirits.
No one drank mineral spirits at bachelor party.
Many people think that a potato chip is savory, but some would rather eat mashed potatoes.
...
more sentences
file substitute sed
statement converts dictionary entries into sed expressions and the main sed
uses them for the content replacements.
NB: Note that production quality script should take of word cases and also word boundaries to eliminate unwanted substring substitution, which are ignored here.
Upvotes: 1