Reputation: 1505
I'm running OSX. What command line tool could I use for this? I've got a large text file with this JSON output. I'm looking for a way to strip out only those email
s without a last_login_date
, where I'm not interested in the record without one. Here's the output:
{
"_id" : ObjectId("52fba903e4b0aa6226e0ce26"),
"email" : "[email protected]"
}
{
"_id" : ObjectId("521ca254e4b0d28eb6a07f26"),
"email" : "[email protected]",
"last_login_date" : ISODate("2017-04-10T14:27:03.212Z")
}
Is sed or awk a candidate for this? If so, can you show me how strip out from the file:
{
"_id" : ObjectId("52fba903e4b0aa6226e0ce26"),
"email" : "[email protected]"
}
Upvotes: 0
Views: 182
Reputation: 438028
If the input were proper JSON, using third-party CLI jq
would be the right tool - see bottom.
Given that it is not, regular text-processing utilities must be used.
neric's answer works with the BSD grep
that comes with macOS, but relies on a very specific file layout.
awk
allows for a more flexible solution (still assumes that the JSON objects in the input aren't nested, however):
awk -v RS='{' '/"last_login_date"/ { print RS $0 }' file
-v RS='{'
sets RS
, the input record separator, to {
, which means that entire JSON-like objects are read one at a time (without the leading {
).
Regex-matching pattern /"last_login_date"/
looks for substring "last_login_date"
inside each record and only executes the associated action ({...}
) if found.
print "{" $0 }
simply prints matching records with the leading {
re-added.
If the input were proper JSON, using jq
would make the processing both more robust and succinct:
jq 'select(.last_login_date)' file
The above simply selects (filters in) only those JSON objects in the input file that have a last_login_date
property (whose value isn't Boolean false
).
Upvotes: 1
Reputation: 4221
If the records are exactly how you describe them, then you can use:
grep last_login_date -B 3 -A 1 yourFile.json > out.json
Basically grepping for what you interested in and keeping 3 lines before the pattern and 1 line after.
Upvotes: 1