user3639557
user3639557

Reputation: 5291

converting regex to sed or grep regex

I am not sure why this doesn't work. Here is the regex 'text\' => '.*?' and I want to catch estrenos and cine in the following nasty text using grep or sed. Here is what I tried in grep

echo "sadsa d{                             'text' => 'cine',                             'indices' => [                                            111,                                            116                                          ]                           },                           {                             'text' => 'estrenos',                             'indices' => [ sSADW" | grep -Eo "'text\' => '.*?',"

Upvotes: 0

Views: 432

Answers (3)

Ed Morton
Ed Morton

Reputation: 203985

Just use awk:

$ awk -v RS='}' -F\' '{print $4}' file
cine
estrenos

That will work with any awk in any shell on any UNIX box. It will also work no matter what the white space is so it'll work whether your input is on one line or spread across multiple lines and no matter how many blanks or tabs occur anywhere on each line.

Here's how it works:

awk treats all input as records separated into fields. Your input (with spaces compressed for readability):

sadsa d{ 'text' => 'cine', 'indices' => [ 111, 116 ] }, { 'text' => 'estrenos', 'indices' => [ sSADW

clearly has { ... } records:

Record 1:

{ 'text' => 'cine', 'indices' => [ 111, 116 ] }

Record 2:

{ 'text' => 'estrenos', 'indices' => [ sSADW

so we can set the Record Separator to } (with -v RS='}'). I assume your last record will really end in a } too but if it doesn't that's fine as awk treats end of file like the end of a record. We can ignore the text before the {s (i.e. "sadsa d" before the first record and "," between the 2 records - that's really treated as part of the first field but we're not using that field for anything so it's irrelevant.

So given the above 2 records if we split them into fields at every ' (with -F\') then we get:

$ awk -v RS='}' -F\' '{for (i=1; i<=NF;i++) print "Record Nr", NR, "Field Nr", i, "Field Contents: <" $i ">"; print "----"
}' file
Record Nr 1 Field Nr 1 Field Contents: <sadsa d{ >
Record Nr 1 Field Nr 2 Field Contents: <text>
Record Nr 1 Field Nr 3 Field Contents: < => >
Record Nr 1 Field Nr 4 Field Contents: <cine>
Record Nr 1 Field Nr 5 Field Contents: <, >
Record Nr 1 Field Nr 6 Field Contents: <indices>
Record Nr 1 Field Nr 7 Field Contents: < => [ 111, 116 ] >
----
Record Nr 2 Field Nr 1 Field Contents: <, { >
Record Nr 2 Field Nr 2 Field Contents: <text>
Record Nr 2 Field Nr 3 Field Contents: < => >
Record Nr 2 Field Nr 4 Field Contents: <estrenos>
Record Nr 2 Field Nr 5 Field Contents: <, >
Record Nr 2 Field Nr 6 Field Contents: <indices>
Record Nr 2 Field Nr 7 Field Contents: < => [ sSADW
>
----

so as you can see the value you want is always simply the 4th field of each record.

Upvotes: 3

RomanPerekhrest
RomanPerekhrest

Reputation: 92854

tr + sed approach:

(assuming your input text is in variable $s)

sed -n "s/.*'text' => '\([^']*\)'.*/\1/p" <(tr ',' '\n' <<< "$s")

The output:

cine
estrenos

Upvotes: 0

Sjon
Sjon

Reputation: 5175

Remove the escaping character for the single quote. However, since the extended regexp doesn't support non-greedy matching you probably want to use Perl instead:

grep -Po "'text' => '.*?',

Upvotes: 0

Related Questions