Reputation: 11
When using sed on code such as:
echo '{"id": 356709, "author": tom, "time": hello, "author2": {"id": 444444, "pain": high}}' | sed -n 's/^.*"id":"\([^"]*\)".*$/\1/p'
Why does it return only 444444 and not the first id, 356709.
All help is appreciated Thanks
Upvotes: 0
Views: 1208
Reputation: 20002
When you think you can trust the layout in your example, you can try:
echo '{"id": 356709, "author": tom, "time": hello, "author2": {"id": 444444, "pain": high}}' |
sed 's/[^,]*id": \([0-9]*\).*/\1/'
or
echo '{"id": 356709, "author": tom, "time": hello, "author2": {"id": 444444, "pain": high}}' |
tr "," "\n" | grep -Pom 1 "id.. \K\d*"
Upvotes: 0
Reputation: 4340
john1024's answer is the best so far, but is very specific to your string. for example, it would fail if there happens not to be a newline after the first {
. here is an answer that more generally extracts all ids stored as "id":number
in a string, json or otherwise.
how: 1. remove all whitespace with tr
, 2. find all "id":number
with grep
, 3. only output numbers with grep
.
echo "$json" |
tr -d ' \t\n\r\f' |
grep -o '"id":[0-9]\+' |
grep -o '[0-9]\+'
to only output the first id, add -m1
to the last grep
:
echo "$json" |
tr -d ' \t\n\r\f' |
grep -o '"id":[0-9]\+' |
grep -m1 -o '[0-9]\+'
Upvotes: 0
Reputation: 113844
It is better to use a json parser for this job (see Chepner's answer). If one really wants to use sed
(or awk
), see below.
This produces the first ID:
$ cat File
{"id": 356709, "author": tom, "time": hello, "author2": {"id": 444444, "pain": high}}
$ sed -nE 's/"id": ([^,]*),.*$/\n\1/; s/[^\n]*\n//' File
356709
Because sed regular expressions are greedy, the first substitute command matches on the first id
. The second substitute command is necessary to remove what comes before the first id
.
How it works:
s/"id": ([^,]*),.*$/\n\1/
This matches from the first occurrence of "id:"
to the end of the line while saving the id number itself in group 1. It replaces this portion of the line with a newline, \n
, followed by the id number, \1
.
Since sed
reads input line-by-line, a newly read-in sed pattern space will never contain a newline character. Thus, we can be sure that the \n
that we add to the line with this command will be the only newline in the pattern space.
s/[^\n]*\n//
This matches from the beginning of the line to the first newline and removes it all.
To get the first id using awk (this probably requires GNU awk):
$ awk -F, 'NR>1{print $1; exit}' RS='"id": ' File
356709
To get all ids using awk:
$ awk -F, 'NR>1{print $1}' RS='"id": ' File
356709
444444
How it works: awk implicitly reads a file one record at a time. By default, awk treats one line as a record. For our purposes, we ask it to break records on each id
. This is done as follows:
-F,
This tells awk to use a comma as the field separator
NR>1{print $1}
This tells awk to print the first field in all records after the first.
RS='"id": '
This tells awk to break up records wherever it sees the string "id":
. This assures that the first field in any record after the first will be an id
number.
Upvotes: 1
Reputation: 531215
Assuming valid JSON, this is simply
json='{"id": 356709, "author": "tom", "time": "hello", "author2": {"id": 444444, "pain": "high"}}'
echo "$json" | jq '.id'
with jq
. Use the right tool for the job.
Upvotes: 1
Reputation: 21965
Because the ^.*"id":
swallows the first "id": 356709
.
Remember .
matches any character and with *
it reads any character any number of times.
Clearly this is not the best approach here but I can't proceed further because I don't have any idea about the expected output.
I am tempted to share this answer regarding the [ removal of html tags ] using sed.
Upvotes: 0