Reputation: 45
I have this code in some file
<pre class="bbCodeCode" dir="ltr" data-xf-init="code-block" data-lang=""><code>-Fix numcer one/Two
-EMM Support
-Fix update < broken
-Add support patch</code></pre>
</div>
</div><b><br />
I need to remove some characters and keep just this code
-Fix numcer one/Two
-EMM Support
-Fix update < broken
-Add support patch
I have try this code
#!/bin/bash
sed -n '/>-/,/</p' /home/Desktop/1 > /home/Desktop/2
sed -n '/^-*code>/p' /home/raed/Desktop/2 > /home/Desktop/3
sed -i 's#</code></pre>##' /home/Desktop/3
exit
But the code remove first line
-Fix numcer one/Two
Upvotes: 2
Views: 41
Reputation: 133760
1st solution: Try GNU awk
for this one. With your shown samples please try following awk
code.
awk -v RS="^$" '
match($0,/(^|\n)<pre class="[^"]*".*<code>-(.*)<\/code>/,arr){
print arr[2]
}
' Input_file
Explanation: Simple explanation would be, using GNU awk
's capability to make RS
^$
and then using its match
function to match regex (^|\n)<pre class="[^"]*".*<code>-(.*)<\/code>
(explained later in this answer). This regex creates 2 capturing groups and all matched values are getting stored into array named arr
. So if regex has matched values then I am simply printing 2nd element of array arr
by using arr[2]
to get desired values.
2nd solution: With sed
using -z
and -E
options please try following code.
sed -zE 's/(^|\n)<pre class="[^"]*".*<code>-(.*)<\/code>.*/\2/' Input_file
OR if your sed
version supports \n
then with a slight change in above sed
code you can have as follows:
sed -zE 's/(^|\n)<pre class="[^"]*".*<code>-(.*)<\/code>.*/\2\n/' Input_file
3rd solution: With GNU grep
please try following code:
grep -zoP '(^|\n)<pre class="[^"]*".*?<code>-\K(.*?\n[^\n]+)+(?=</code>)' Input_file
4th solution: If you really want to go with your approach(looks like you don't have GNU version of sed
) then Let me try with your approach here but this will be very straight forward sed
with little less validations for data compare to previous solutions of mine but this will do the job for you in case your sample Input_file is always same.
sed -En '/^<pre class/s/^<pre class="[^"]*".*<code>-(.*)$/\1/p; /^-/{s/<\/code>.*//; p}' Input_file
Upvotes: 1
Reputation: 17058
Try this
sed 's/<[^>]*>//g' <file
It will remove everything between <
and the next >
(linewise).
Upvotes: 1