noober
noober

Reputation: 1505

How to use sed or awk to strip out a set of lines or block?

I'm running OSX. What command line tool could I use for this? I've got a large text file with this JSON output. I'm looking for a way to strip out only those emails without a last_login_date, where I'm not interested in the record without one. Here's the output:

{
        "_id" : ObjectId("52fba903e4b0aa6226e0ce26"),
        "email" : "[email protected]"
}
{
        "_id" : ObjectId("521ca254e4b0d28eb6a07f26"),
        "email" : "[email protected]",
        "last_login_date" : ISODate("2017-04-10T14:27:03.212Z")
}

Is sed or awk a candidate for this? If so, can you show me how strip out from the file:

{
        "_id" : ObjectId("52fba903e4b0aa6226e0ce26"),
        "email" : "[email protected]"
}

Upvotes: 0

Views: 182

Answers (2)

mklement0
mklement0

Reputation: 438028

If the input were proper JSON, using third-party CLI jq would be the right tool - see bottom.
Given that it is not, regular text-processing utilities must be used.

neric's answer works with the BSD grep that comes with macOS, but relies on a very specific file layout.

awk allows for a more flexible solution (still assumes that the JSON objects in the input aren't nested, however):

awk -v RS='{' '/"last_login_date"/ { print RS $0 }' file
  • -v RS='{' sets RS, the input record separator, to {, which means that entire JSON-like objects are read one at a time (without the leading {).

  • Regex-matching pattern /"last_login_date"/ looks for substring "last_login_date" inside each record and only executes the associated action ({...}) if found.

  • print "{" $0 } simply prints matching records with the leading { re-added.


If the input were proper JSON, using jq would make the processing both more robust and succinct:

jq 'select(.last_login_date)' file

The above simply selects (filters in) only those JSON objects in the input file that have a last_login_date property (whose value isn't Boolean false).

Upvotes: 1

neric
neric

Reputation: 4221

If the records are exactly how you describe them, then you can use:

grep last_login_date -B 3 -A 1 yourFile.json > out.json

Basically grepping for what you interested in and keeping 3 lines before the pattern and 1 line after.

Upvotes: 1

Related Questions