Reputation:
I have files like the following:
<div title="alpha" Mauris eu justo sed nisi aliquet blandit. <span name="ll">beta</span> Fusce in pharetra nisi. <span name="ll">gamma</span> Aliquam vehicula imperdiet turpis et rhoncus. <span name="ll">delta</span> Donec faucibus augue quis neque dictum, at rutrum dolor placerat.</div>
I try to get the content of nth name="ll"
attribute in place of title=
content while preserving the order of the rest.
For example, the 2nd name="ll"
would get me:
<div title="gamma" Mauris eu justo sed nisi aliquet blandit. <span name="ll">beta</span> Fusce in pharetra nisi. Aliquam vehicula imperdiet turpis et rhoncus. <span name="ll">delta</span> Donec faucibus augue quis neque dictum, at rutrum dolor placerat.</div>
Etcetera.
My try:
find . -type f -exec perl -pi -w -e 's/(title=)"?[^"\s]*"?(.*)((?:.*?\h+class="ll">){1}.*?)\h+class="ll">"?([^"\s]+)"?(<.*)/$1"$3"$2$4/' \{\} \;
Where do I make the mistake?
Upvotes: 2
Views: 78
Reputation: 784998
This perl solution should work for you:
# matching 2nd <span name="ll">
perl -pe 's~(title=)"?[^"\s]*"?((?:.*?\h+<span name="ll">){1}.*?)\h+<span name="ll">([^<]+)</span>~$1"$3"$2~' file
<div title="gamma" Mauris eu justo sed nisi aliquet blandit. <span name="ll">beta</span> Fusce in pharetra nisi. Aliquam vehicula imperdiet turpis et rhoncus. <span name="ll">delta</span> Donec faucibus augue quis neque dictum, at rutrum dolor placerat.</div>
# matching 3rd <span name="ll">
perl -pe 's~(title=)"?[^"\s]*"?((?:.*?\h+<span name="ll">){2}.*?)\h+<span name="ll">([^<]+)</span>~$1"$3"$2~' file
<div title="delta" Mauris eu justo sed nisi aliquet blandit. <span name="ll">beta</span> Fusce in pharetra nisi. <span name="ll">gamma</span> Aliquam vehicula imperdiet turpis et rhoncus. Donec faucibus augue quis neque dictum, at rutrum dolor placerat.</div>
RegEx Explanation:
Explanation:
(title=)
: Match title=
and capture in group #1"?[^"\s]+"?
: Match an optionally quoted non-space string(
: Start capture group #2
(?:
: Start non-capture group
.*?
: Match any text (lazy match)\h+
: Match 1+ whitespaces<span name="ll">
: Match text <span name="ll">
){1}
: End non-capture group and repeat this group {1}
times.*?
: Match any text (lazy match))
: End capture group #2\h+
: Match 1+ whitespaces<span name="ll">
: Match text <span name="ll">
([^<]+)
: Match 1+ of any char that is not a >
and capture in group #3</span>
: Match </span>
$1"$3"$2
: Replacement partUpvotes: 2
Reputation: 241808
Instead of doing everything in one substitution, proceed in steps:
perl -wpe '$n = 2;
@m = /<span name="ll">([^<]+)/g;
s/title="[^"]+"/title="$m[$n-1]"/;
s:<span name="ll">\Q$m[$n-1]\E</span> ::;'
i.e.
Upvotes: 2