user1283776
user1283776

Reputation: 21814

How can I search for a matching substring in a file with regex and return only the substring?

I have a file that is 50 MB and only one line. It is corrupt json and I am searching for a substring.

I have a problem with grep that it does not support lazy search in the default syntax. And pearl syntax doesn't seem to be supported on MacOS.

Here is a basic expression that does work but only returns the id, while I need the entire object

grep -o '1234' largefile

Here are some things I tried that did not work

grep -oP '1234.*?globalId' largefile

P not supported

grep -F '1234' largefile | grep -o -E '.{30} 1234.{500}'

invalid repetition count(s)

grep -o '1234.{100}' largefile

doesn't return anything

How can I do this search? It doens't need to be grep. I sometimes read about awk, perl, ripgrep and other stuff that I have never had reason to try.

Upvotes: 1

Views: 82

Answers (2)

Dudi Boy
Dudi Boy

Reputation: 4900

Another options is to use non match to do lazy grabbing.

The limitation of this solution is that you cannot have G between 1234 to GlobalId

For example:

 text="prefix 12345 some text 12345 some text 2 GlobalId more text 3 GlobalId suffix"

 echo "$text" | grep -o  "1234[^G]\+GlobalId"
 12345 some text 12345 some text 2 GlobalId

 echo "$text" | grep -oP  "1234.+?GlobalId"
 12345 some text 12345 some text 2 GlobalId

Read more about this trick here.

Upvotes: 0

Timur Shtatland
Timur Shtatland

Reputation: 12465

Use Perl one-liner if grep -P is not supported. For example, this will print all matches captured in parens, 1 match per line:

perl -lne 'print for /(1234.*?globalId)/g' in_file

Upvotes: 2

Related Questions