webspace hosting reseller hosting|             | blog| forum| dating| free hosting| openhost| report abuse
Internet Fax To Email - Unlimited

Unlimited Faxes, No Fees, Dedicated Phone Number

Free Website Templates

Anuj Blog

Express your thoughts here

The Problem with Markup Languages


Chris Shiflett has a post today, Allowing HTML and Preventing XSS. The problem is how to allow users to format their contributed content without introducing security vulnerabilities. The answer is usually some sort of markup language or filtering and sanitization of HTML.

BBCODE was designed for this purpose. There is no actual standard, but the core syntax seems fairly uniform. It's good for those used to forums, where it seems to norm.

HTML markup is nice because it is a standard, even if varying subsets are supported. Learning a little HTML isn't going to hurt anyone, at least for the next 20 years or so. The problem is that HTML was never intended to be hand edited. The syntax is not the most inviting, and different HTML-like markup languages handle whitespace differently than the HTML standard.

Wiki markup syntaxes were designed to be human friendly. The main problem I have with wiki syntax is that there is no standard. It seems like every wiki has a different way to formulate a link, for example. I guess there is some progress with Wiki Creole, but I still have a bad taste in my mouth.

The other problem I have with wiki markup is that I find it to be non-deterministic. When I edit any given wiki and try to use more than basic formatting, I never know what I am going to get. Most of the markup processing engines for these wikis are impenetrable morasses of regular expressions. It can be hard to gauge interactions. Are you really sure they are secure?

Speaking of impenetrable morasses of regular expressions, have you ever looked at WordPress's input path? I'm sure every one with a WordPress blog who likes to blog about PHP code knows that it is a code eater. I've been particularly disappointed with WordPress in this area. Most the "code formatting" plugins still have problems protecting code from WordPress' heavy hand.

But the WordPress preg_replace gauntlet doesn't just mangle code. I have a post which has been sitting in draft mode for several weeks because I can't figure out how to give it the proper markup. WordPress is somehow taking my perfectly balanced input markup and producing "unbalanced" output markup. I haven't yet tracked down the problem to either submit a fix or to do a good bug report. Frankly, I'm not looking forward to trudging through all those regular expressions.

In Chris' post, he takes the regular expression approach. Folks in the comments have pointed out a few problems with his approach, including the problem of interleaved tags. If you can't tell by now, I am not a fan of the regular expression gauntlet approach to markup languages. I prefer a defined syntax and a traditional computer science style parser (which may use regular expressions).

The other must-have is a preview option. With so much variation in markup languages, not having a preview leaves the user to play Russian roulette with their submitted content. I've talked about that before in the usability of input filtering. This is another area where WordPress leaves the user high and dry.

The complex input path in WordPress combined with its reliance on global variables seems to leave it unable to do an in-page preview. The admin area preview is an IFRAME so that it launches a separate request. The various live preview plugins are JavaScript based and don't work when it is disabled. They also don't pass the input through the same input path that WordPress uses, so they are not a true preview.

I don't mean for this to be a WordPress rant, on the whole, I like WordPress. Rather, I just wanted to point out how hard it can be to do good input filtering, that is safe, reliable, deterministic, and usable.

by admin | Wednesday 19 September 2007 12:16pm | Knowledge Base | permalink | 0 comments

Comments

This entry has no comments yet.

Comment Entry Form

Name
EMail (will not be displayed)
URL
Comment
Verify Code
(type code into box)
Remember Me
Bold = Required

Navigation...

Previous : Delphi for PHP
Next :

About

For more information about me, visit at www.anuj-blog.co.nr.

Contact

Calendar

« November 2009 »
MTWTFSS
      1
2345678
9101112131415
16171819202122
23242526272829
30      

Search

 

Recent Entries

  • The Problem with Markup Languages
    2007-09-19 12:16:22
  • Categorised

    Affiliations