Reputation: 13
With a bash script, I extracted a .conllu file into a three columned .txt with the Lemma, POS and meaning. So some kind of dictionary. Now I am trying to make it prettier by putting the second column (POS) in brackets.
It looks like:
ami NOUN mother
amo VERB sleep
asima NOUN younger_sister
ati NOUN older_sister
Every column is seperated by a tab.
I want it to look like this:
ami (NOUN) mother
amo (VERB) sleep
asima (NOUN) younger_sister
ati (NOUN) older_sister
and ideally:
ami (NOUN) - mother
amo (VERB) - sleep
asima (NOUN) - younger_sister
ati (NOUN) - older_sister
I tried regex and sed
sed -e 's/[a-zA-Z]+ /(/g' -e 's+[a-zA-Z]+=[a-zA-Z]+/)/g' dictjaa.txt > test.txt
but failed unfortunately.
Upvotes: 1
Views: 67
Reputation: 163457
If there are always uppercase characters A-Z:
sed -E 's/([[:blank:]])([A-Z]+)[[:blank:]]+/\1(\2) - /' dictjaa.txt > test.txt
The pattern matches:
([[:blank:]])
Capture group 1, match either a space or tab([A-Z+])
Capture group 2, match 1+ uppercase chars A-Z[[:blank:]]+
Match 1+ occurrences of either a space or tabThe content of test.txt:
ami (NOUN) - mother
amo (VERB) - sleep
asima (NOUN) - younger_sister
ati (NOUN) - older_sister
Upvotes: 1
Reputation: 11237
Using sed
sed -E 's/([^[:alpha:]]+)([^ ]*) /\1(\2) -/' input_file
ami (NOUN) - mother
amo (VERB) - sleep
asima (NOUN) - younger_sister
ati (NOUN) - older_sister
Upvotes: 1