Karan Parikh
Karan Parikh

Reputation: 321

What regex shall I use to fetch the content of css classes and store them?

Hi I have created a textbox to copy content from the PDF and accept the content in rich text format.

<html>
<head>
 <link rel="stylesheet" type="text/css" href="Theme.css"> 
</head>
<body>
<div>
    <textarea id="ta" onpaste="functionItalic(event)" class="foostyle2"></textarea>
</div>
<div>
    <span style="font-weight: bolder; font-size: 20px;">
        <span id="1">Karan's</span>
     </span>
    <span style="font-weight: bolder; font-size: 24px; font-style: italic;">test</span>
</div>
<script>
function functionItalic(pasteEvent)
{
var textareacont = (pasteEvent.originalEvent || pasteEvent).clipboardData.getData("text/html");
console.log(textareacont);
}
</script>
</body>
</html>

When I printed the content on console(the content was For smoke-protected assembly seating, the common), I found that the content in the PDF contains css classes and html tags like this

Note: this code was obtained on execution of console.log(textareacont); For smoke-protected assembly seating, the common

CSS Tweaking.html (line 19)

<html>
<head><meta http-equiv=Content-Type content="text/html; charset=utf-8"><style>
<!--
br
{
mso-data-placement:same-cell;
}
table
{
mso-displayed-decimal-separator:"\.";
mso-displayed-thousand-separator:"\, ";
}
tr
{
mso-height-source:auto;
mso-ruby-visibility:none;
}
td
{
border:.5pt solid windowtext;
}
.NormalTable{cellspacing:0;cellpadding:10;border-collapse:collapse;mso-table-layout-alt:fixed;border:none; mso-border-alt:solid windowtext .75pt;mso-padding-alt:0cm 5.4pt 0cm 5.4pt; mso-border-insideh:.75pt solid windowtext;mso-border-insidev:.75pt solid windowtext}
.fontstyle0
{
    font-family:Times-Roman;
    font-size:10pt;
    font-style:normal;
    font-weight:normal;
    color:rgb(0,0,0);
}
.fontstyle1
{
    font-size:12pt;
    font-style:normal;
    font-weight:normal;
    color:rgb(0,0,0);
}
.fontstyle2
{
    font-family:Times-Italic;
    font-size:10pt;
    font-style:italic;
    font-weight:normal;
    color:rgb(0,0,0);
}
-->
</style></head><body>
<!--StartFragment-->
<span class="fontstyle0">For </span><span class="fontstyle2">smoke-protected assembly seating</span><span class="fontstyle0">, the </span><span class="fontstyle2">common</span> 
<br style=" font-style: normal; font-variant: normal; font-weight: normal; letter-spacing: normal; line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; -webkit-text-size-adjust: auto; -webkit-text-stroke-width: 0px; ">
<!--EndFragment-->
</body>
</html>

What I want is that I want to get the properties of css classes with name .fontstyle1,.fontstyle2,.fontstyle3 in string format, someone said that this can be achieved through RegEx can anybody tell me what will be the RegEx that I need to use so as to store .fontstyle classes in string. I have tried a few but they have'nt work. The newlines,carriage returns and tab feeds are part of the string only that are appearing in the classes.

If somebody knows an another way of storing the .fontstyle classes content in string.Please help I don't know much about regex.

Upvotes: 0

Views: 77

Answers (1)

akgren_soar
akgren_soar

Reputation: 337

1. You can use this regex

/\.fontstyle\d+\s*\{[\w\s-:;,()]*\}/g

I'm not sure if i'm giving too much help but since its already coded...

    // The regular expression
    var regularExp = /\.fontstyle\d+\s*\{[\w\s-:;,()]*\}/g;
    var match;
    // .fontstyle will be stored in fontstyle_list[]
    var fontstyle_list = [];

    // Finds all match
    while (match = regularExp.exec(/*The css file (converted to string) should go here*/)) {
        // Adds every match into fontstyle_list
        fontstyle_list.push(match[0]);
    }

    // Iterate through every element in fontstyle_list
    for (var i in fontstyle_list){
        // prints out each .fontstyle{}
        document.write(fontstyle_list[i] + "<br />"); // document.write() is unsafe and should only be used for testing

        // add your codes here

    }

2. Alternatively

You can parse the css into an object before using substring to retrieve each element that starts with '.fontstyle'

  1. Parse the css into an object. Refer to https://stackoverflow.com/a/14865690/6943913
  2. Iterate through all elements of the object in step 1
  3. For each element in step 2, perform <eachElement>.substring(0, 10) === ".fontstyle"
  4. For each operation that returned true in step 3, copy out the element into another list

*Disclaimer: the steps above is just to illustrate the program logic, some adjustments might need to be made to fit the actual scenario

Upvotes: 1

Related Questions