Reputation: 171
I need to interpret text copied from emails. Currently, the users copy the text of the email and paste it into an HTML text area. I already have a 90% solution, but here is one case that is stumping me. The data is stored in a table in the email. Here is one row of that data, which could easily have 50 rows or more, in a similar format:
<tr>
<td valign=3D"top" style=3D"background:white;padding:0in 0in 0in 0in">
<p class=3D"MsoNormal"><span style=3D"color:black">WI</span><o:p></o:p></p>
</td>
<td valign=3D"top" style=3D"background:white;padding:0in 0in 0in 0in">
<p class=3D"MsoNormal"><span style=3D"color:black">BARABOO 53913</span><o:p></o:p></p>
</td>
<td valign=3D"top" style=3D"background:white;padding:0in 0in 0in 0in">
<p class=3D"MsoNormal"><span style=3D"color:black">8:00</span><o:p></o:p></p>
</td>
<td valign=3D"top" style=3D"background:white;padding:0in 0in 0in 0in">
<p class=3D"MsoNormal"><span style=3D"color:black">VAN</span><o:p></o:p></p>
</td>
<td valign=3D"top" style=3D"background:white;padding:0in 0in 0in 0in">
<p class=3D"MsoNormal"><span style=3D"color:black">WI</span><o:p></o:p></p>
</td>
<td valign=3D"top" style=3D"background:white;padding:0in 0in 0in 0in">
<p class=3D"MsoNormal"><span style=3D"color:black">8/29/2015</span><o:p></o:p></p>
</td>
</tr>
This is an example of the sort of things I have to accomodate, although I actually want to accomodate a lot more.
When the user pastes that row, it turns into this:
WI
BARABOO 53913
8:00
VAN
WI
8/29/2015
Keep in mind that I am receiving many rows, so they all get run together. The number, order, and format of the columns are completely inconsistent, sometimes even within the same document.
If I could get this, I can use my already existing code to parse it:
WI BARABOO 53913 8:00 VAN WI 8/29/2015
But I have pretty much nothing to work with. If I had the raw HTML, I could parse it safely (It is never displayed), but I can't get it. Does anyone know how I can get this as raw HTML or some other coherent format? I doubt if it matters, but in most cases, the source of the copy will be MS Outlook.
edit: the whole goal is to make this machine-parsable. I don't need help with the parsing, I have that covered. I just need something useful to parse.
Upvotes: 2
Views: 649
Reputation: 2613
Here's a Jquery solution.
$('#txtArea').on('paste', function() {
setTimeout(function () {
var txt=$(this).val();
$(this).val(txt.replace(/\s+/g, ' '));
}, 100);
});
Upvotes: 1
Reputation: 78840
It looks like the paste event, for Chrome and Firefox, may have a clipboardData
property of type DataTransfer. That has a getData
method that takes a content type, so you may be able to do this to check if the content is HTML:
textArea.addEventListener('paste', function (e) {
var html = e.clipboardData && e.clipboardData.getData('text/html');
if (html) {
// handle HTML table logic
}
});
Update:
Interestingly, IE has a beforepaste event which looks like it has a similar clipboardData
object, so maybe you can handle that browser using this technique.
Upvotes: 1
Reputation: 46323
You can get the pasted HTML maintained if you replace your text area with a "contentEditable" element, such as a <div>
. Try this for example, it will alert the html "source" you paste into it:
var paste = document.getElementById('paste');
paste.onpaste = function() { setTimeout(function() { alert(paste.innerHTML); }, 1); };
#paste {
width:200px;
height:60px;
border: 2px solid blue;
}
<div id="paste" contentEditable="true"></div>
Note that the content is available after the onpaste
event fires, so use a timeout.
Upvotes: 3
Reputation: 21575
One way to handle this is to simply format the data yourself when it is pasted. For example, you can replace the newlines with spaces, then replace the many spaces with a single space. Then set the textarea
with that new value:
text.replace(/\r|\n|\n\r/g, ' ').replace(/ +(?= )/g,'');
Then you would, have this in a onpaste
event. Make text
be the clipboard contents, and finally set the textarea
to the new text:
document.getElementById("text").addEventListener('paste', function (e) {
var text = e.clipboardData.getData('text/plain');
text = text.replace(/\r|\n|\n\r/g, ' ').replace(/ +(?= )/g,'');
setTimeout(function(){
document.getElementById("text").value = text;
}, 10);
});
Here is a fiddle example. Take the content and paste it into the text area, it will be changed to "WI BARABOO 53913 8:00 VAN WI 8/29/2015"
.
Upvotes: 0