Reputation: 9488
Consider this datafile
random text "txt" random text
random text "txt1" random text "txt2"
random text "txt1" random text "txt3"
random text "txt1" random text "txt4"
random text "txt1" random text "txt5"
random text "txt1" random text "txt5" random text "txt6" random text
For each of this line, I need to extract everything inside the quotes, I.E.
txt
txt1,txt2
txt1,txt3
txt1,txt4
txt1,txt5
txt1,txt5,txt6
There can be multiple quotes in a single line.
I wrote this regex in shell(actually I wrote a sed command,but when I paste it here, it screws up the .*)
^dotStar"[^"]+"dotStar$(for single number quote)
^dotStar"[^"]+"dotStar"[^"]+"dotStar$(if there are two quotes)
As you can see, my regex is dependent on the number of quotes appearing. Can anyone give me a generic reg-ex, which irrespective on the number of times quotes is appearing, it gives me the text.
Upvotes: 2
Views: 99
Reputation: 158130
You can use this sed
command:
sed --posix 's/[^"]*"\([^"]*\)"[^"]*/\1,/g;s/\(.*\),/\1/' input.txt
Output:
txt
txt1,txt2
txt1,txt3
txt1,txt4
txt1,txt5
txt1,txt5,txt6
Upvotes: 5