Jakob
Jakob

Reputation: 141

Using sed to extract strings from a text file

I have text data in this form:

^Well/Well[ADV]+ADV ^John/John[N]+N ^has/have[V]+V+3sg+PRES ^a/a[ART]
^quite/quite[ADV]+ADV ^different/different[ADJ]+ADJ ^not/not[PART]
^necessarily/necessarily[ADV]+ADV ^more/more[ADV]+ADV
^elaborated/elaborate[V]+V+PPART ^theology/theology[N]+N *edu$

And I want it to be processed to this form:

Well John have a quite different not necessarily more elaborate theology

Basically, I need every string between the starting character / and the ending character [.

Here is what I tried, but I just get empty files...

#!/bin/bash

for file in probe/*.txt

do sed '///,/[/d' $file > $file.aa

mv $file.aa $file

done

Upvotes: 2

Views: 314

Answers (3)

Benjamin W.
Benjamin W.

Reputation: 52556

With GNU grep and Perl compatible regular expressions (-P):

$ echo $(grep -Po '(?<=/)[^[]*' infile)
Well John have a quite different not necessarily more elaborate theology

-o retains just the matches, (?<=/) is a positive look-behind ("make sure there is a /, but don't include it in the match"), and [^[]* is "a sequence of characters other than [".

grep -Po prints one match per line; by using the output of grep as arguments to echo, we convert the newlines into spaces (could also be done by piping to tr '\n' ' ').

Upvotes: 2

karakfa
karakfa

Reputation: 67567

awk to the rescue!

$ awk -F/ -v RS=^ -v ORS=' ' '{print $1}' file

Well John has a quite different not necessarily more elaborated theology 

Explanation set record separator (RS) to ^ to separate your logical groups, also set the field separator (FS) to / and print the first field as your requirement. Finally, setting the output field separator (OFS) to space (instead of the default new line) keeps the extracted fields on the same line.

Upvotes: 4

Ipor Sircer
Ipor Sircer

Reputation: 3141

 cat file|grep -oE "\/[^\[]*\[" |sed -e 's#^/##' -e 's/\[$//' | tr -s "\n" " " 

Upvotes: -1

Related Questions