Reputation: 5291
I am not sure why this doesn't work. Here is the regex 'text\' => '.*?'
and I want to catch estrenos
and cine
in the following nasty text using grep or sed. Here is what I tried in grep
echo "sadsa d{ 'text' => 'cine', 'indices' => [ 111, 116 ] }, { 'text' => 'estrenos', 'indices' => [ sSADW" | grep -Eo "'text\' => '.*?',"
Upvotes: 0
Views: 432
Reputation: 203985
Just use awk:
$ awk -v RS='}' -F\' '{print $4}' file
cine
estrenos
That will work with any awk in any shell on any UNIX box. It will also work no matter what the white space is so it'll work whether your input is on one line or spread across multiple lines and no matter how many blanks or tabs occur anywhere on each line.
Here's how it works:
awk treats all input as records separated into fields. Your input (with spaces compressed for readability):
sadsa d{ 'text' => 'cine', 'indices' => [ 111, 116 ] }, { 'text' => 'estrenos', 'indices' => [ sSADW
clearly has { ... }
records:
Record 1:
{ 'text' => 'cine', 'indices' => [ 111, 116 ] }
Record 2:
{ 'text' => 'estrenos', 'indices' => [ sSADW
so we can set the Record Separator to }
(with -v RS='}'
). I assume your last record will really end in a }
too but if it doesn't that's fine as awk treats end of file like the end of a record. We can ignore the text before the {
s (i.e. "sadsa d" before the first record and "," between the 2 records - that's really treated as part of the first field but we're not using that field for anything so it's irrelevant.
So given the above 2 records if we split them into fields at every '
(with -F\'
) then we get:
$ awk -v RS='}' -F\' '{for (i=1; i<=NF;i++) print "Record Nr", NR, "Field Nr", i, "Field Contents: <" $i ">"; print "----"
}' file
Record Nr 1 Field Nr 1 Field Contents: <sadsa d{ >
Record Nr 1 Field Nr 2 Field Contents: <text>
Record Nr 1 Field Nr 3 Field Contents: < => >
Record Nr 1 Field Nr 4 Field Contents: <cine>
Record Nr 1 Field Nr 5 Field Contents: <, >
Record Nr 1 Field Nr 6 Field Contents: <indices>
Record Nr 1 Field Nr 7 Field Contents: < => [ 111, 116 ] >
----
Record Nr 2 Field Nr 1 Field Contents: <, { >
Record Nr 2 Field Nr 2 Field Contents: <text>
Record Nr 2 Field Nr 3 Field Contents: < => >
Record Nr 2 Field Nr 4 Field Contents: <estrenos>
Record Nr 2 Field Nr 5 Field Contents: <, >
Record Nr 2 Field Nr 6 Field Contents: <indices>
Record Nr 2 Field Nr 7 Field Contents: < => [ sSADW
>
----
so as you can see the value you want is always simply the 4th field of each record.
Upvotes: 3
Reputation: 92854
tr + sed approach:
(assuming your input text is in variable $s
)
sed -n "s/.*'text' => '\([^']*\)'.*/\1/p" <(tr ',' '\n' <<< "$s")
The output:
cine
estrenos
Upvotes: 0
Reputation: 5175
Remove the escaping character for the single quote. However, since the extended regexp doesn't support non-greedy matching you probably want to use Perl instead:
grep -Po "'text' => '.*?',
Upvotes: 0