Billy Moon
Billy Moon

Reputation: 58531

Manipulate the content of HTML strings without changing the HTML

If I have a string of HTML, maybe like this...

<h2>Header</h2><p>all the <span class="bright">content</span> here</p>

And I want to manipulate the string so that all words are reversed for example...

<h2>redaeH</h2><p>lla eht <span class="bright">tnetnoc</span> ereh</p>

I know how to extract the string from the HTML and manipulate it by passing to a function and getting a modified result, but how would I do so whilst retaining the HTML?

I would prefer a non-language specific solution, but it would be useful to know php/javascript if it must be language specific.

Edit

I also want to be able to manipulate text that spans several DOM elements...

Quick<em>Draw</em>McGraw

warGcM<em>warD</em>kciuQ

Another Edit

Currently, I am thinking to somehow replace all HTML nodes with a unique token, whilst storing the originals in an array, then doing a manipulation which ignores the token, and then replacing the tokens with the values from the array.

This approach seems overly complicated, and I am not sure how to replace all the HTML without using REGEX which I have learned you can go to the stack overflow prison island for.

Yet Another Edit

I want to clarify an issue here. I want the text manipulation to happen over x number of DOM elements - so for example, if my formula randomly moves letters in the middle of a word, leaving the start and end the same, I want to be able to do this...

<em>going</em><i>home</i>

Converts to

<em>goonh</em><i>gmie</i>

So the HTML elements remain untouched, but the string content inside is manipulated (as a whole - so goinghome is passed to the manipulation formula in this example) in any way chosen by the manipulation formula.

Upvotes: 1

Views: 2636

Answers (6)

Ricardo Araque
Ricardo Araque

Reputation: 81

You can use a setInterval to change it every ** time for example:

 
const TITTLE = document.getElementById("Tittle") //Let's get the div
   
 setInterval(()=> { 
      let TITTLE2 = document.getElementById("rotate") //we get the element at the moment of execution
      let spanTittle = document.createElement("span"); // we create the new element "span"

      spanTittle.setAttribute("id","rotate");  // attribute to new element
      (TITTLE2.textContent == "TEXT1")       // We compare wich string is in the div
      ? spanTittle.appendChild(document.createTextNode(`TEXT2`)) 
      : spanTittle.appendChild(document.createTextNode(`TEXT1`))

      TITTLE.replaceChild(spanTittle,TITTLE2)   //finally, replace the old span for a new
    },2000)
<html>
<head></head>
<body>  
   <div id="Tittle">TEST YOUR <span id="rotate">TEXT1</span></div>
</body>
</html>
   

Upvotes: 0

Billy Moon
Billy Moon

Reputation: 58531

I implemented a version that seems to work quite well - although I still use (rather general and shoddy) regex to extract the html tags from the text. Here it is now in commented javascript:

Method

/**
* Manipulate text inside HTML according to passed function
* @param html the html string to manipulate
* @param manipulator the funciton to manipulate with (will be passed single word)
* @returns manipulated string including unmodified HTML
*
* Currently limited in that manipulator operates on words determined by regex
* word boundaries, and must return same length manipulated word
*
*/

var manipulate = function(html, manipulator) {

  var block, tag, words, i,
    final = '', // used to prepare return value
    tags = [], // used to store tags as they are stripped from the html string
    x = 0; // used to track the number of characters the html string is reduced by during stripping

  // remove tags from html string, and use callback to store them with their index
  // then split by word boundaries to get plain words from original html
  words = html.replace(/<.+?>/g, function(match, index) {
    tags.unshift({
      match: match,
      index: index - x
    });
    x += match.length;
    return '';
  }).split(/\b/);

  // loop through each word and build the final string
  // appending the word, or manipulated word if not a boundary
  for (i = 0; i < words.length; i++) {
    final += i % 2 ? words[i] : manipulator(words[i]);
  }

  // loop through each stored tag, and insert into final string
  for (i = 0; i < tags.length; i++) {
    final = final.slice(0, tags[i].index) + tags[i].match + final.slice(tags[i].index);
  }

  // ready to go!
  return final;

};

The function defined above accepts a string of HTML, and a manipulation function to act on words within the string regardless of if they are split by HTML elements or not.

It works by first removing all HTML tags, and storing the tag along with the index it was taken from, then manipulating the text, then adding the tags into their original position in reverse order.

Test

/**
 * Test our function with various input
 */

var reverse, rutherford, shuffle, text, titleCase;

// set our test html string
text = "<h2>Header</h2><p>all the <span class=\"bright\">content</span> here</p>\nQuick<em>Draw</em>McGraw\n<em>going</em><i>home</i>";

// function used to reverse words
reverse = function(s) {
  return s.split('').reverse().join('');
};

// function used by rutherford to return a shuffled array
shuffle = function(a) {
  return a.sort(function() {
    return Math.round(Math.random()) - 0.5;
  });
};

// function used to shuffle the middle of words, leaving each end undisturbed
rutherford = function(inc) {
  var m = inc.match(/^(.?)(.*?)(.)$/);
  return m[1] + shuffle(m[2].split('')).join('') + m[3];
};

// function to make word Title Cased
titleCase = function(s) {
  return s.replace(/./, function(w) {
    return w.toUpperCase();
  });
};

console.log(manipulate(text, reverse));
console.log(manipulate(text, rutherford));
console.log(manipulate(text, titleCase));

There are still a few quirks, like the heading and paragraph text not being recognized as separate words (because they are in separate block level tags rather than inline tags) but this is basically a proof of method of what I was trying to do.

I would also like it to be able to handle the string manipulation formula actually adding and removing text, rather than replacing/moving it (so variable string length after manipulation) but that opens up a whole new can of works I am not yet ready for.

Now I have added some comments to the code, and put it up as a gist in javascript, I hope that someone will improve it - especially if someone could remove the regex part and replace with something better!

Gist: https://gist.github.com/3309906

Demo: http://jsfiddle.net/gh/gist/underscore/1/3309906/

(outputs to console)

And now finally using an HTML parser

(http://ejohn.org/files/htmlparser.js)

Demo: http://jsfiddle.net/EDJyU/

Upvotes: 0

WatsMyName
WatsMyName

Reputation: 4478

Hi I came to this situation long time ago and i used the following code. Here is a rough code

<?php
function keepcase($word, $replace) {
   $replace[0] = (ctype_upper($word[0]) ? strtoupper($replace[0]) : $replace[0]);
   return $replace;
}

// regex - match the contents grouping into HTMLTAG and non-HTMLTAG chunks
$re = '%(</?\w++[^<>]*+>)                 # grab HTML open or close TAG into group 1
|                                         # or...
([^<]*+(?:(?!</?\w++[^<>]*+>)<[^<]*+)*+)  # grab non-HTMLTAG text into group 2
%x';

$contents = '<h2>Header</h2><p>the <span class="bright">content</span> here</p>';

// walk through the content, chunk, by chunk, replacing words in non-NTMLTAG chunks only
$contents = preg_replace_callback($re, 'callback_func', $contents);

function callback_func($matches) { // here's the callback function
    if ($matches[1]) {             // Case 1: this is a HTMLTAG
        return $matches[1];        // return HTMLTAG unmodified
    }
    elseif (isset($matches[2])) {  // Case 2: a non-HTMLTAG chunk.
                                   // declare these here
                                   // or use as global vars?
        return preg_replace('/\b' . $matches[2] . '\b/ei', "keepcase('\\0', '".strrev($matches[2])."')",
            $matches[2]);
    }
    exit("Error!");                // never get here
}
echo ($contents);
?>

Upvotes: 1

Alex
Alex

Reputation: 9031

could use jquery?

$('div *').each(function(){
    text = $(this).text();
    text = text.split('');
    text = text.reverse();
    text = text.join('');
    $(this).text(text);
});

See here - http://jsfiddle.net/GCAvb/

Upvotes: 0

Fabrizio Calderan
Fabrizio Calderan

Reputation: 123377

If you want to achieve a similar visual effect without changing the text you could cheat with css, with

h2, p {
  direction: rtl;
  unicode-bidi: bidi-override;
}

this will reverse the text

example fiddle: http://jsfiddle.net/pn6Ga/

Upvotes: 1

Quentin
Quentin

Reputation: 943510

Parse the HTML with something that will give you a DOM API to it.

Write a function that loops over the child nodes of an element.

If a node is a text node, get the data as a string, split it on words, reverse each one, then assign it back.

If a node is an element, recurse into your function.

Upvotes: 0

Related Questions