Sahil
Sahil

Reputation: 9488

Regex to extract everything in the quotes in bash shell?

Consider this datafile

random text "txt" random text
random text "txt1" random text "txt2"
random text "txt1" random text "txt3"
random text "txt1" random text "txt4"
random text "txt1" random text "txt5"
random text "txt1" random text "txt5" random text "txt6" random text

For each of this line, I need to extract everything inside the quotes, I.E.

txt
txt1,txt2
txt1,txt3
txt1,txt4
txt1,txt5
txt1,txt5,txt6
There can be multiple quotes in a single line.

I wrote this regex in shell(actually I wrote a sed command,but when I paste it here, it screws up the .*)

^dotStar"[^"]+"dotStar$(for single number quote)
^dotStar"[^"]+"dotStar"[^"]+"dotStar$(if there are two quotes)

As you can see, my regex is dependent on the number of quotes appearing. Can anyone give me a generic reg-ex, which irrespective on the number of times quotes is appearing, it gives me the text.

Upvotes: 2

Views: 99

Answers (1)

hek2mgl
hek2mgl

Reputation: 158130

You can use this sed command:

sed --posix 's/[^"]*"\([^"]*\)"[^"]*/\1,/g;s/\(.*\),/\1/' input.txt

Output:

txt
txt1,txt2
txt1,txt3
txt1,txt4
txt1,txt5
txt1,txt5,txt6

Upvotes: 5

Related Questions