shantanuo
shantanuo

Reputation: 32306

Remove all the text using sed

Format:

[Headword]{}"UC(icl>restriction)"(Attributes);(gloss)

The testme.txt file has 2 lines

[testme] {} "acetify" (V,lnk,CJNCT,AJ-V,VINT,VOO,VOO-CHNG,TMP,Vo) <H,0,0>; 
[newtest] {} "acid-fast" (ADJ,DES,QUAL,TTSM) <H,0,0>;

The expected output is this:

testme = acetify
newtest = acid-fast

What I have achieved so far is:

cat testme.txt | sed 's/[//g' | sed 's/]//g' | sed 's/{}/=/g' | sed 's/\"//'

testme = acetify" (V,lnk,CJNCT,AJ-V,VINT,VOO,VOO-CHNG,TMP,Vo) <H,0,0>;
newtest = acid-fast" (ADJ,DES,QUAL,TTSM) <H,0,0>;

How do I remove all the text from the second " to the end of the line?

Upvotes: 1

Views: 312

Answers (4)

Dennis Williamson
Dennis Williamson

Reputation: 359905

Your whole sequence of multiple calls to sed can be replaced by:

sed 's/\[\([^]]*\)][^"]*"\([^"]*\).*/\1 = \2/' inputfile

Upvotes: 1

ghostdog74
ghostdog74

Reputation: 342303

this is how you do it with awk instead of all those sed commands, which is unnecessary. what you want is field 1 and field 3. use gsub() to remove the quotes and brackets

$ awk '{gsub(/\"/,"",$3);gsub(/\]|\[/,"",$1);print $1" = "$3}' file
testme = acetify
newtest = acid-fast

Upvotes: 1

David Z
David Z

Reputation: 131550

The whole process might be a little quicker with awk:

awk 'NF > 0 { print $1 " = " $3 }' testme.txt | tr -d '[]"'

Upvotes: 1

Konerak
Konerak

Reputation: 39763

Remove everything after the doublequote-space-openparenthesis " (:

sed 's/" (.*//g'

Upvotes: 1

Related Questions