How we could extract text from HTML code using ColdFusion?
We will use some regular expression to achieve this.
To replace JS and CSS code we have to use "<(script|style).*?</\1>".
So, if we will combine the two regular expression then we can get actual text from the HTML code which may contain some CSS and JS code.
The final regular expression will be "<(script|style).*?</\1>|<.*?>".
Our HTML code is:
So, the final ColdFusion code to extract text from above HTML would be follows:
After, all these steps we will get following text as the out put.
In the final regular expression "<(script|style).*?</\1>|<.*?>", we have used expression to remove any CSS/JS first then remove the HTML. As if we will change the order to "<.*?>|<(script|style).*?</\1>" then the CSS/JS code will be there in the final output. As the CSS/JS code will match with the first part and it will treat as normal HTML code.