Justin Ma
Justin Ma

Reputation: 11

Using sed to split up JSON

When using sed on code such as:

echo '{"id": 356709, "author": tom, "time": hello, "author2": {"id": 444444, "pain": high}}' | sed -n 's/^.*"id":"\([^"]*\)".*$/\1/p'

Why does it return only 444444 and not the first id, 356709.

All help is appreciated Thanks

Upvotes: 0

Views: 1208

Answers (5)

Walter A
Walter A

Reputation: 20002

When you think you can trust the layout in your example, you can try:

echo '{"id": 356709, "author": tom, "time": hello, "author2": {"id": 444444, "pain": high}}' |
   sed 's/[^,]*id": \([0-9]*\).*/\1/'

or

echo '{"id": 356709, "author": tom, "time": hello, "author2": {"id": 444444, "pain": high}}' |
   tr "," "\n" | grep -Pom 1 "id.. \K\d*"

Upvotes: 0

webb
webb

Reputation: 4340

john1024's answer is the best so far, but is very specific to your string. for example, it would fail if there happens not to be a newline after the first {. here is an answer that more generally extracts all ids stored as "id":number in a string, json or otherwise.

how: 1. remove all whitespace with tr, 2. find all "id":number with grep, 3. only output numbers with grep.

echo "$json" |
  tr -d ' \t\n\r\f' |
  grep -o '"id":[0-9]\+' |
  grep -o '[0-9]\+'

to only output the first id, add -m1 to the last grep:

echo "$json" |
  tr -d ' \t\n\r\f' |
  grep -o '"id":[0-9]\+' |
  grep -m1 -o '[0-9]\+'

Upvotes: 0

John1024
John1024

Reputation: 113844

It is better to use a json parser for this job (see Chepner's answer). If one really wants to use sed (or awk), see below.

Using sed

This produces the first ID:

$ cat File
{"id": 356709, "author": tom, "time": hello, "author2": {"id": 444444, "pain": high}}
$ sed -nE 's/"id": ([^,]*),.*$/\n\1/; s/[^\n]*\n//' File
356709

Because sed regular expressions are greedy, the first substitute command matches on the first id. The second substitute command is necessary to remove what comes before the first id.

How it works:

  • s/"id": ([^,]*),.*$/\n\1/

    This matches from the first occurrence of "id:" to the end of the line while saving the id number itself in group 1. It replaces this portion of the line with a newline, \n, followed by the id number, \1.

    Since sed reads input line-by-line, a newly read-in sed pattern space will never contain a newline character. Thus, we can be sure that the \n that we add to the line with this command will be the only newline in the pattern space.

  • s/[^\n]*\n//

    This matches from the beginning of the line to the first newline and removes it all.

Using awk

To get the first id using awk (this probably requires GNU awk):

$ awk -F, 'NR>1{print $1; exit}' RS='"id": ' File
356709

To get all ids using awk:

$ awk -F, 'NR>1{print $1}' RS='"id": ' File
356709
444444

How it works: awk implicitly reads a file one record at a time. By default, awk treats one line as a record. For our purposes, we ask it to break records on each id. This is done as follows:

  • -F,

    This tells awk to use a comma as the field separator

  • NR>1{print $1}

    This tells awk to print the first field in all records after the first.

  • RS='"id": '

    This tells awk to break up records wherever it sees the string "id":. This assures that the first field in any record after the first will be an id number.

Upvotes: 1

chepner
chepner

Reputation: 531215

Assuming valid JSON, this is simply

json='{"id": 356709, "author": "tom", "time": "hello", "author2": {"id": 444444, "pain": "high"}}'
echo "$json" | jq '.id'

with jq. Use the right tool for the job.

Upvotes: 1

sjsam
sjsam

Reputation: 21965

Because the ^.*"id": swallows the first "id": 356709. Remember . matches any character and with * it reads any character any number of times.

Clearly this is not the best approach here but I can't proceed further because I don't have any idea about the expected output.

I am tempted to share this answer regarding the [ removal of html tags ] using sed.

Upvotes: 0

Related Questions