webspace hosting reseller hosting| web hosting| blog| forum| dating| free hosting| openhost| report abuse
Internet Fax To Email - Unlimited

Unlimited Faxes, No Fees, Dedicated Phone Number

Free Website Templates

Anuj Blog

Express your thoughts here

The Problem with Markup Languages


Chris Shiflett has a post today, Allowing HTML and Preventing XSS. The problem is how to allow users to format their contributed content without introducing security vulnerabilities. The answer is usually some sort of markup language or filtering and sanitization of HTML.

BBCODE was designed for this purpose. There is no actual standard, but the core syntax seems fairly uniform. It's good for those used to forums, where it seems to norm.

HTML markup is nice because it is a standard, even if varying subsets are supported. Learning a little HTML isn't going to hurt anyone, at least for the next 20 years or so. The problem is that HTML was never intended to be hand edited. The syntax is not the most inviting, and different HTML-like markup languages handle whitespace differently than the HTML standard.

Wiki markup syntaxes were designed to be human friendly. The main problem I have with wiki syntax is that there is no standard. It seems like every wiki has a different way to formulate a link, for example. I guess there is some progress with Wiki Creole, but I still have a bad taste in my mouth.

The other problem I have with wiki markup is that I find it to be non-deterministic. When I edit any given wiki and try to use more than basic formatting, I never know what I am going to get. Most of the markup processing engines for these wikis are impenetrable morasses of regular expressions. It can be hard to gauge interactions. Are you really sure they are secure?

Speaking of impenetrable morasses of regular expressions, have you ever looked at WordPress's input path? I'm sure every one with a WordPress blog who likes to blog about PHP code knows that it is a code eater. I've been particularly disappointed with WordPress in this area. Most the "code formatting" plugins still have problems protecting code from WordPress' heavy hand.

But the WordPress preg_replace gauntlet doesn't just mangle code. I have a post which has been sitting in draft mode for several weeks because I can't figure out how to give it the proper markup. WordPress is somehow taking my perfectly balanced input markup and producing "unbalanced" output markup. I haven't yet tracked down the problem to either submit a fix or to do a good bug report. Frankly, I'm not looking forward to trudging through all those regular expressions.

In Chris' post, he takes the regular expression approach. Folks in the comments have pointed out a few problems with his approach, including the problem of interleaved tags. If you can't tell by now, I am not a fan of the regular expression gauntlet approach to markup languages. I prefer a defined syntax and a traditional computer science style parser (which may use regular expressions).

The other must-have is a preview option. With so much variation in markup languages, not having a preview leaves the user to play Russian roulette with their submitted content. I've talked about that before in the usability of input filtering. This is another area where WordPress leaves the user high and dry.

The complex input path in WordPress combined with its reliance on global variables seems to leave it unable to do an in-page preview. The admin area preview is an IFRAME so that it launches a separate request. The various live preview plugins are JavaScript based and don't work when it is disabled. They also don't pass the input through the same input path that WordPress uses, so they are not a true preview.

I don't mean for this to be a WordPress rant, on the whole, I like WordPress. Rather, I just wanted to point out how hard it can be to do good input filtering, that is safe, reliable, deterministic, and usable.

by admin | Wednesday 19 September 2007 12:16pm | Knowledge Base | permalink | 0 comments

Delphi for PHP



I have to comment on this week's annoucement of Delphi for PHP. I was a Delphi programmer for about 5 years before taking up PHP about 6 years ago. What a convergence.

I have a great fondness and respect for the old Object Pascal based Delphi. Delphi's VCL has been influential, inspiring the GUI components in Java. And, of course Ander Heijlsberg went on to put a huge stamp on C# and .NET that would be familiar to any Delphi programmers.

I've always admired this approach of extending the language syntax to make common things easy and for the integration between the language and the tools. In Delphi, this was evidenced by the excellent properties support. Six years later, this is the feature I miss the most in PHP. This language extension approach has seen its culmination in C# and LINQ. It almost pains me to say it, but the cutting edge of commercial language design is at Microsoft now.

On the other hand, I've never had that much respect for Borland as a company. We were big enough to have Borland representative's come to our office and try sell us their products. They were terrible at the mechanics of selling into big companies. I was in their beta programs. I went to their conferences. I've never had any sense that they know what they are doing business wise. Inprise? What were they thinking? Now here they are, just having gotten their asses kicked by eclipse in the Java IDE space and what are they working on? They release an IDE for PHP, just as Zend is embracing Eclipse in the PHP space. Brilliant!

I don't quite know what Delphi means now. To me, its always been and IDE plus Object Pascal. What is it now? I also don't quite know what Borland has become. Is it CodeGear now? I guess that the Delphi for PHP IDE comes from Quadram and their now discontinued QStudio product. And the VCL is their WCL (no linkage found). Anytime I've been touched by the corporate entity that was Borland, confusion ensued. I'm confused now.

It appears that the PHP version of the VCL will be released on open source. There is nothing at the sourceforge project, yet, but I'll be interested to see what it looks like, if only for old times sake.

The Delphi tool approach was to serialize an object based representation of an application, then offer tools to create that serialized representation, and to load that representation at run time. In Delphi, that serialization was done into the form files (.DFM). I'll be interested to see how Delphi for PHP does it. Perhaps, this is an area where the Eclipse PHP Development Tool can learn. I know that I definitely had Delphi in mind when I was writing my column on Object Serialization for this month's php | Architect.

Meanwhile, if you want to see the Delphi influence in PHP with code that you can download today, take a look at the Prado framework, which I imagine to be like the VCL for PHP, but without the supporting IDE.

by admin | Wednesday 19 September 2007 11:54am | Knowledge Base | permalink | 0 comments

VALUE OF LIFE

The value of life does not depend on the length of time on this Earth but rather on the amount of love given and shared to the people we care about.

by admin | Wednesday 12 September 2007 12:15pm | Default | permalink | 54 comments

About

For more information about me, visit at www.anuj-blog.co.nr.

Contact

Calendar

« July 2009 »
MTWTFSS
  12345
6789101112
13141516171819
20212223242526
2728293031  

Search

 

Recent Entries

  • The Problem with Markup Languages
    2007-09-19 12:16:22
  • Delphi for PHP
    2007-09-19 11:54:06
  • VALUE OF LIFE
    2007-09-12 12:15:14
  • Categorised

    Affiliations