user7352907
user7352907

Reputation: 87

using sed i want to print only some string in lines

I have a file which has following data. I just want the ownerId numbers and the profileID values separated by :.

My file:

ObjectId("57a046a06f858a9c73b3468a"), "ownerId" : "923003345778", "profileId" : "FreeBundles,LBCNorthParentOffer", "instanceId" : null, "queuedFor" : "unassigned", "state" : "active", "createDateTime" : 1470121632, "startDateTime" : 1470121632, "expireDateTime" : 1485673632, "removeDateTime" : 1487747232, "extensionDateTime" : null, "cancelled" : false, "mode" : "onceOff", "nextMode" : "none", "profileData" : { "serviceProfileId" : "ecs19", "counter" : 1 } }
 ObjectId("57a046a06f858a9c73b34688"), "cancelled" : false, "createDateTime" : 1470121632, "expireDateTime" : 1557514799, "extensionDateTime" : null, "instanceId" : null, "mode" : "onceOff", "nextMode" : "none", "ownerId" : "923003345778", "profileData" : { "serviceProfileId" : "ecs19", "counter" : 1 }, "profileId" : "Prov3G,HLRProv", "queuedFor" : "unassigned", "removeDateTime" : 1557514799, "startDateTime" : 1470121632, "state" : "active" }
 ObjectId("56d48bd38a8b93baa708fcfa"), "ownerId" : "923003309452", "profileId" : "DiscountOnUsage,Segment04", "instanceId" : null, "queuedFor" : "unassigned", "state" : "active", "createDateTime" : 1456770003, "startDateTime" : 1456770003, "expireDateTime" : null, "removeDateTime" : null, "extensionDateTime" : null, "cancelled" : false, "mode" : "onceOff", "nextMode" : "none", "profileData" : { "serviceProfileId" : "ecs19", "counter" : 1 } }
 ObjectId("560ed95f6ca6e0703cf26fcc"), "cancelled" : false, "createDateTime" : 1443813727, "expireDateTime" : 1544381999, "extensionDateTime" : null, "instanceId" : null, "mode" : "onceOff", "nextMode" : "none", "ownerId" : "923003309452", "profileData" : { "serviceProfileId" : "ecs19", "counter" : 1 }, "profileId" : "Prov3G,HLRProv", "queuedFor" : "unassigned", "removeDateTime" : 1544381999, "startDateTime" : 1443813727, "state" : "active" }

Output:

923003345778 : FreeBundles,LBCNorthParentOffer

923003345778 : Prov3G,HLRProv

923003309452 : DiscountOnUsage,Segment04

923003309452 : Prov3G,HLRProv

Please also explain me in detail the answer if anyone knows.

Upvotes: 1

Views: 86

Answers (2)

Ed Morton
Ed Morton

Reputation: 204628

$ sed 's/.*ObjectId("\([^"]*\).*"profileId" *: *"\([^"]*\).*/\1 : \2/' file
57a046a06f858a9c73b3468a : FreeBundles,LBCNorthParentOffer
57a046a06f858a9c73b34688 : Prov3G,HLRProv
56d48bd38a8b93baa708fcfa : DiscountOnUsage,Segment04
560ed95f6ca6e0703cf26fcc : Prov3G,HLRProv

I really don't think any explanation is needed as it's very straight forward but let me know if you have any questions.

Upvotes: 1

Wintermute
Wintermute

Reputation: 44073

This is a rather awkward situation you've managed to put yourself into.

As a rule, you do not want to handle structured data with plain-text tools like sed. Any solution you come up with will be brittle in the face of formatting changes (such as spaces or newlines between JSON fields), and certain corner cases (such as JSON strings with quotation marks in them) are awkward to handle with it. If you have JSON, you want to use a JSON tool to handle it.

However, you don't exactly have JSON there. This is a textual representation of BSON (likely from MongoDB) that has already had some parts chopped off.

What you really want to do

A sane way to solve this problem is to make MongoDB give you JSON and let something like jq do the formatting. Once you have a proper JSON file, this will be as simple as

jq -r '"\(.ownerId) : \(.profileId)"' file.json

mongoexport may be your friend here, or putting JSON.stringify() around your query in the MongoDB shell1; it depends on how you got this data in the first place. This approach will require you to save the unchopped data, but anyway I suspect that whatever made you chop the BSON into pieces should be replaced with something similar to improve reliability.

1If you got the data from the MongoDB shell, you may want to consider doing the formatting there, though.

How to hack yourself deeper into this mess with sed

However, since you don't currently have proper JSON, you may want to try to hack yourself out of this mess with sed. This is a terrible idea, and I cannot stress enough that you never ever want to do this in a production environment. If you do, you'll be in a deeper mess than before, and that sort of vicious cycle is not a happy place to be.

So, what I'm about to show you is the sort of thing that you do as a one-off in a hurry and are never going to use again because you promise yourself to do it properly next time. You want to check the results carefully. Here goes:

sed 'h;/^.*"profileId"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/!d;s//\1/;x;/^.*"ownerId"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/!d;s//\1/;G;s/\n/ : /' file.bsonish

This makes the following assumptions about the input data:

  1. One full object per line. Newlines in the wrong place will break this.
  2. No " in either the ownerId or the profileID field

Furthermore, it will not recognize broken data, which is always a nice feature. On the upside, it does not require the ownerId and profileId fields to appear in any particular order.

It works as follows:

# Save a copy of the input data; we'll isolate the fields separately.
h

# See if there's a profileId field. If not, the line is silently dropped.
/^.*"profileId"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/!d
# Isolate that profileId field. // in this context means: reuse the last
# regex (the big one)
s//\1/

# Now swap in the saved input data. We'll get ownerId next.
x
# Isolate ownerId as before. If there is no ownerId field, drop line silently.
/^.*"ownerId"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/!d
s//\1/

# append profileId field in hold buffer to what we have
G

# Replace the newline between the two with a colon and some spaces.
s/\n/ : /

Upvotes: 0

Related Questions