HTML Purifier
Wednesday, October 31st, 2007Kore Nordmann explains why in his opinion one shouldn’t use BBCode for comments and forums. I think he has a point, but it only holds when the BBCode is parsed using regular expressions, as he explains in another article. Actually, you’re not really parsing the BBCode when using regular expressions, because it is pattern matching. He explains why it makes no difference to use HTML syntax instead of BBCode syntax. Obviously, he has a very good point, because the BBCode syntax is not well defined, while HTML syntax - especially for the things that normally are allowed in blog comments or on forums - are well defined and known by many people.
An intresting observation is that, even despite the good explanation of the problem with BBCode - a false sense of security when parsing it with regexps - is that people demonstrate in the comments that they really don’t understand it. For example, one comment states that it is almost impossible to block all not allowed HTML using blacklists… Obviously, one shouldn’t use blacklists, but whitelists. By default, all < and > should be replaced by < and >.
HTML Purifier is a library that parses HTML and uses a whitelist to allow certain HTML tags and attributes. Why should one develop something like this from scratch when there is alreay a library available?