Reputation: 557
I am trying to extract only few information from a big string like
[[["좋은","good","joh-eun",""]],[["adjective",[["좋은",["good","nice","pretty","admirable","canny","tenacious"],,0.38553435]],"good",4],["adverb",["훌륭하게",["wonderfully","good","nicely","beautifully","fine","finely"],,0.00029145498],"good",4]]]
i want to extract the string like this
좋은 - good
좋은 - good,nice,pretty,admirable,canny,tenacious (basically adjectives)
훌륭하게 - wonderfully,good,nicely,beautifully,fine,finely (adverbs)
please help i tried using sed and pipe to cut like
cut --delimiter='"' -f 1-2 and then use sed 's/\[\[\[\"//'
This is giving me first korean 좋은 as result, i am not able to extend this to get desired result! If there is any other better way to achieve this, please suggest. Thanks in advance.
Upvotes: 2
Views: 302
Reputation: 15784
A little late but in pure regex suitable for sed:
regex: \[\[\["(.*?)","(.*?)"\]\],\[\["(.*?)",\[\["(.*?)",\["(.*?)"\],.*?\]\],.*?\],\["(.*?)",\["(.*?)",\["(.*)"\],.*\]\]\]
Substitution: \1 - \2\n\4 - \5 (\3)\n\7 - \8 (\6)
Assuming there's always adjectives and adverbs brackets in the orignal line... (even if empty)
See the substitution in demo to how to reorg the matches.
Upvotes: 2
Reputation: 246774
Here's a piece of ruby, but probably any PCRE-equipped tool can do something similar:
ruby -ne '
$_.gsub(/"/,"")
.scan(/ (\p{Hangul}+) ,\[? (.+?) \] /x) {|m| puts m[0] + " - " + m[1]}
' <<END
[[["좋은","good","joh-eun",""]],[["adjective",[["좋은",["good","nice","pretty","admirable","canny","tenacious"],,0.38553435]],"good",4],["adverb",["훌륭하게",["wonderfully","good","nicely","beautifully","fine","finely"],,0.00029145498],"good",4]]]
END
좋은 - good,joh-eun,
좋은 - good,nice,pretty,admirable,canny,tenacious
훌륭하게 - wonderfully,good,nicely,beautifully,fine,finely
Too bad the original text isn't in easier to handle JSON.
Thanks to this question for how to match Korean characters.
Upvotes: 1