CommuterJoy » Logbook

« logbook home

Posted by mattc at Mar 16, 09 10:44 AM ... Comments (0)

Before bomber planes came in the existence WWI aircraft crew used to take a sack of bombs in to their biplane cockpits & lob them over the side after flying in to enemy territory [1]. I like practical actions in the face of technical inperfection.

[1] according to a Timewatch DVD I just watched.

Posted by mattc at May 14, 08 10:45 AM ... Comments (0)

Jeff's rant about the ubiquity of XML got me thinking about the things I like about XML, so here's some notes...

1. Standard API's for parsing. - DOM, SAX, E4X, StAX etc. These are all attempts to provide standard (ie. langauge agnostic, by consensus) ways of reading/writing the XML data.

I know JSON and YAML etc. all come with their own parsers from which you can munge the YAML/JSON in to some internal data structure or object and set about looping over it and extracting bits out in whatever way you see fit but the API-like approach of defining how you should access data XML makes more sense to me. I'd rather developers I work with use standard ways of processing and accessing data than each doing their own thing.

Having said that, DOM isn't perfect, so there's many libraries (like JQuery) that provide convience access methods, and guess what? These differ per language. Argh!

2. Vocabularies. With XML I can define my own vocabularly and mix parts of existing vocabularies. Ascribing meaning the data you are working with forces you to think outside of how the infomation should be structured, more about what it represents. I've found JSON and YAML over-literal in their representations of data, so you end up designing formats that looks like data structures, which *is* great for many situations but loses something of the semantics.

3. Type checking. By defining vocabularies (either in RelaxNG or XML Schema) you will probably end up using XML Schema data types, making it easy to tightly define (and enforce the integrity) of the data you are working with. There's some really helpful default types (like ID + IDREFS) that solve specific problems when working with XML as well as the usual date, duration, uri types.

4. Same-langauge Schema. One useful byproduct of XML Schema (or RelaxNG) documents being defined in the same terms as the data themselves (ie. XML - a common complaint) is that they can very naturally become part of the transformation process (or, say, unit testing process).

Say your Schema includes an implicit attribute with a default value and your XML source documents sometimes include it, sometimes not. The knowledge of this particular attribute's behaviour and properties can be written in to the XML processing language without having to be overly specific about the details.

# if the attribute doesn't exist and is defined as mandatory in the schema, 
# then go and fetch the schema value and output it.
IF not(foo/@bar) and doc(schema.xml//element[@name = foo]/optional/attribute[@name = 'bar'])
   PRINT doc(schema.xml/...)
END

I think JSON schemas (being valid JSON themselves) will probably benefit from the same approach.

5. XPath 2.0. When using other formats I've never understood how to get the data I want from the JSON/YAML/CSV data structure other than having to write little subroutines with temporary data structures, loops etc. to extract, join, compare, transform the info in to what I want. That's sometimes ok, but XPath (particularly XPath 2.0 used with Saxon 9) elimates this problem for me by providing a hugely expressive set of statements for selecting, sorting, and querying parts of the document combined with some more mundane things like regexp and a whole variety of string functions.

I know a lot of people were put off from using XML as general container formats by XSLT & XPath 1.0 but I found version 2.0 feels so much more natural to author without having to jump off to using extension functions every other statement.

I'm not a complete XML zealot. The project I'm doing at the moment uses a variety of XML (for source data & communication from web services), JSON (for browser loaded data) and CSV (for producers to edit), whatever fits really.

Posted by mattc at Dec 13, 06 04:36 PM ... Comments (0)

I was just about to write a regular expression, when suddenly ...

I stumbled on the fact that upon feeding dates formatted as RFC 822 (as commonly found in RSS 2.0) in to a newly instantiated Javascript Date object it just handles it. No ifs or buts, it just works. I didn't expect that.

 var foo = 'Fri, 04 Apr 2003 05:04:39 GMT';
 var bar = new Date( foo );
 var woo = bar.getYear()  // woo holds '2003'

How very helpful. This means I could combine some getElementsByTagName construct with Date to give me an array of feed items by date without too much fuss ...

 var foo = new Array();
 // where 'o' is the response from some xmlHTTP request
 var rss = o.responseXML.getElementsByTagName("item");  
 for ( var j = 0; j < rss.length; j++ ) {
    foo.push( new Date( rss[j].getElementsByTagName("pubDate")[0].textContent ) );
     }

But wait. Simon and Mark point out that RSS has many dates and times.

So I wonder how JavaScript handles these?

 // load each date type in to foo
 var foo = new Array('2003-03-21T16:28:40', '2003-04-03T07:45:57-08:00', 'Fri, 04 Apr 2003 05:04:39 GMT', 'Fri, 28 Mar 2003 05:18:59 -0800', '1049379042.0', '2003-03-21T16:28:40', '2003-01-17T13:03:00+00:00', '2003-03-27T19:41:49-06:00' );
 // iterate foo and write the year to the screen
 for ( var i = 0; i < foo.length; i++ ) {
   var bar = new Date( foo[i] );
   // print output, attempt to call getFullYear
   document.write( foo[i] + " - " + bar.getFullYear() + "\n" );
   }

In Opera 9, almost perfectly ...

 2003-03-21T16:28:40 - 2003
 2003-04-03T07:45:57-08:00 - 2003
 Fri, 04 Apr 2003 05:04:39 GMT - 2003 
 Fri, 28 Mar 2003 05:18:59 -0800 - 2003
 1049379042.0 - NaN  // bah!
 2003-03-21T16:28:40 - 2003
 2003-01-17T13:03:00+00:00 - 2003
 2003-03-27T19:41:49-06:00 - 2003

The only error is the obscure '1049379042.0', which I assume is a reference to the number of seconds passed since midnight 1970. I'm not sure who is using that in their pubDate fields !?

IE 6, Firefox 1.5 & Safari 2.0.4 do much worse, only managing to parse and return valid Date objects from 2 out of the 7 dates.

 2003-03-21T16:28:40 - NaN 
 2003-04-03T07:45:57-08:00 - NaN
 Fri, 04 Apr 2003 05:04:39 GMT - 2003
 Fri, 28 Mar 2003 05:18:59 -0800 - 2003
 1049379042 - NaN
 2003-03-21T16:28:40 - NaN
 2003-01-17T13:03:00+00:00 - NaN
 2003-03-27T19:41:49-06:00 - NaN

So, to recap, Opera's Date object supports ISO 8601 date parsing upon construction, everything else doesn't.

I find the ECMA standard terse at the best of times, it's unclear whether it's meant to be doing this or not.

I also see MochiKit provides extensions for this sort of thing via it's DateTime library.

Posted by mattc at Nov 26, 06 06:47 PM ... Comments (0)

Here's my notes from the very useful Testing Computer Software by Cem Kaner.

There's some quite concisely expressed profundities in the early chapters, my two favourites being ...

A great programmer is less likely than a incompetent tester. (chapter 2)

... and ...

The 'quality' of software is fundamentally measured in human terms. Therefore, in testing for bugs we are looking to determine the *degree* of usefulness of the system to the user. (chapter 4)

random bookmark
link summary month October 2009 (1)
September 2009 (14)
August 2009 (16)
July 2009 (21)
June 2009 (24)
May 2009 (16)
April 2009 (2)
March 2009 (22)
February 2009 (11)
January 2009 (11)
December 2008 (9)
November 2008 (16)
October 2008 (18)
September 2008 (11)
August 2008 (12)
July 2008 (20)
June 2008 (15)
May 2008 (27)
April 2008 (9)
March 2008 (10)
February 2008 (8)
January 2008 (8)
December 2007 (12)
November 2007 (10)
October 2007 (10)
September 2007 (6)
August 2007 (13)
July 2007 (8)
June 2007 (10)
May 2007 (12)
April 2007 (5)
March 2007 (12)
February 2007 (13)
January 2007 (22)
December 2006 (21)
November 2006 (28)
August 2006 (1)
category code (15)
food (4)
notes (4)
photo (18)
project (2)
quote (12)
sketch (13)
soup (10)
travel (2)