'A real tech company like Amazon ... isn't going to hire you, because they can find plenty of people who can both design things and implement them" - I think this is salient for those of us straddling management/development. People should do both.
Hmmm, I worked a bit on this project towards it's end (ie. I helped diagnose a few bugs and got it running on the internal network) but I'll refrain from offering my personal perspective for the world to read - ask me in the pub ;)
Posted by mattc at May 28, 08 01:13 PM
... Comments (0)
If you wanted to make a tarball but you have a bunch of .svn directories stuffing up the place this will make the tar.gz but exclude hidden directories,
Not the first project to generate text-as-images on the fly, but possibly the first open, collabroative effort. I'm not sure why they bother to have the javascript layer, why not just embed calls to text via <img> elements
Posted by mattc at May 21, 08 09:37 AM
... Comments (0)
I feel a bit dumb for only having just discovered the patch command.
My local file system is full of copies of bits of code that have gradually morphed from what I set out to do to what I ended up with, a path strewn with fruitless (but brave, nonetheless!) diversions. Using Subversion does solve the problem managing ever changing files over time it's pretty bothersome to keep jumping around the revision history in your local working copy or even attempting to compare and run multiple versions of the same file at the same time. Even remembering which revision does what is a bit of a struggle unless you've got a good commit message convention.
I think patch can make this easier by allowing you to store your experiments from the main trunk code as a little library of diff snippets.
Eg.
Let's say you have a JavaScript file called 'original',
-- original --
// returns a charArray of a string
String.prototype.toCharArray = function(){
var a = this.split("");
return a;
}
If you copied the above file and added an experiment to it you might end up with this,
-- new --
// returns a charArray of a string
String.prototype.toCharArray = function(){
if ( this.length == 0 ) // don't want empty arrays
return false;
var a = this.split("");
return a;
}
You can now diff the two files and store the result as a patch file ...
diff orginal new > lengthcheck.patch
... the contents of which looks something like this,
At some later date you can patch your code with following command.
patch -b orginal lengthcheck.patch
The -b switch makes a backup of your code. Patch will prompt you if it finds a problem and store any rejected patches in a seperate file for you to inspect.
The idea being that in the course of, say, a 2 day hacking session, you can keep the core code in your SVN trunk directory while the deviations, for better or worse, can be stored in a library of diff's that you can periodically merge in and out of your mainline development.
This project adds several well known techniques for finding strings-that-almost-match-other-strings to the native Javascript String object. Includes Soundex, Metaphone, Caverphone, Levenshtein distance ...
I did something like this for my parents last year, namely embedded oddmuse (the wiki) in to a pretty html/css 'frame', wrote a few plugins for missing features, then let them get on with updating the site. Wiki's are great for small CMS-a-likes.
Posted by mattc at May 15, 08 12:48 PM
... Comments (0)
I think I might learn how to do things in Python, sysadmin tasks, mini web apps and the like.
I've come to know a healthy amount about Perl in the past few years, mainly due to it being the only language officially supported at work, but it has some things I've not really got on with.
It's error handling is a bit rubbish if you are used to the try/catch/throw style of some other languages. Errors in Perl are mostly handled by adding conditionals around (or on the end of) a bunch of statements.
# if there's a problem opening 'foo.txt' then exit with the error
open( file, 'foo.txt' ) or die $!;
The OO stuff in Perl feels a bit contrived and it's easy to cheat or pick up bad habits. Some of Perl's basic functions remain resolute in their non-OOness...
# adding an item to an array, passing the array as an argument to the push functin
my @foo = ("a", "b", "c");
push(@foo, "d");
The feeling of tacked-on OO also manifests itself in calling a classes methods. You have to remember to pluck the object out from an implied argument before operating on it. Normally you'd expect to be able to use a 'this'-like reference without having to manage this sort of low-level stuff yourself.
# if foo was a method of some class, $class would hold a reference to the calling object.
sub foo{
my $class = @_;
}
Perhaps the main reason I don't want to keep using Perl is that it hasn't seemed to introduce much of interest to the language over the 4 or 5 years since I've known it. Most other languages I know have had pretty significant upgrades and improvements in that time (XSLT, JavaScript ...). In that time Perl has had a few minor version number patches but I can't see anything to motivate a casual user like myself to upgrade so I just stick with whatever is on the box I'm using.
Maybe Python won't do these things any better, but I won't know until I try.
Updated
I forgot one other thing. Because I don't write Perl every day I find it a real struggle to remember the specifics of the often dense and syntax. For example to get the length of an array you need to remember the $# convention, which you eventually remember after the first few times, but something like '[array].length' would be more obvious. There's lots of little ticks like this $_ (implied variable), $! (error message), @_ (arguments to an subroutine) that you don't use so often as a casual developer and have to scout around to trigger your memory...
# assign the length of array 'foo' to $a
my $a = $#foo;
Heard this referenced in the latest IT Conversation talk - 13/5/2008. They chat about the trend towards virtualized infrastructures and the importance of expressing configuration in code rather than hardware.
1. Standard API's for parsing. - DOM, SAX, E4X, StAX etc. These are all attempts to provide standard (ie. langauge agnostic, by consensus) ways of reading/writing the XML data.
I know JSON and YAML etc. all come with their own parsers from which you can munge the YAML/JSON in to some internal data structure or object and set about looping over it and extracting bits out in whatever way you see fit but the API-like approach of defining how you should access data XML makes more sense to me. I'd rather developers I work with use standard ways of processing and accessing data than each doing their own thing.
Having said that, DOM isn't perfect, so there's many libraries (like JQuery) that provide convience access methods, and guess what? These differ per language. Argh!
2. Vocabularies. With XML I can define my own vocabularly and mix parts of existing vocabularies. Ascribing meaning the data you are working with forces you to think outside of how the infomation should be structured, more about what it represents. I've found JSON and YAML over-literal in their representations of data, so you end up designing formats that looks like data structures, which *is* great for many situations but loses something of the semantics.
3. Type checking. By defining vocabularies (either in RelaxNG or XML Schema) you will probably end up using XML Schema data types, making it easy to tightly define (and enforce the integrity) of the data you are working with. There's some really helpful default types (like ID + IDREFS) that solve specific problems when working with XML as well as the usual date, duration, uri types.
4. Same-langauge Schema. One useful byproduct of XML Schema (or RelaxNG) documents being defined in the same terms as the data themselves (ie. XML - a common complaint) is that they can very naturally become part of the transformation process (or, say, unit testing process).
Say your Schema includes an implicit attribute with a default value and your XML source documents sometimes include it, sometimes not. The knowledge of this particular attribute's behaviour and properties can be written in to the XML processing language without having to be overly specific about the details.
# if the attribute doesn't exist and is defined as mandatory in the schema,
# then go and fetch the schema value and output it.
IF not(foo/@bar) and doc(schema.xml//element[@name = foo]/optional/attribute[@name = 'bar'])
PRINT doc(schema.xml/...)
END
I think JSON schemas (being valid JSON themselves) will probably benefit from the same approach.
5. XPath 2.0. When using other formats I've never understood how to get the data I want from the JSON/YAML/CSV data structure other than having to write little subroutines with temporary data structures, loops etc. to extract, join, compare, transform the info in to what I want. That's sometimes ok, but XPath (particularly XPath 2.0 used with Saxon 9) elimates this problem for me by providing a hugely expressive set of statements for selecting, sorting, and querying parts of the document combined with some more mundane things like regexp and a whole variety of string functions.
I know a lot of people were put off from using XML as general container formats by XSLT & XPath 1.0 but I found version 2.0 feels so much more natural to author without having to jump off to using extension functions every other statement.
I'm not a complete XML zealot. The project I'm doing at the moment uses a variety of XML (for source data & communication from web services), JSON (for browser loaded data) and CSV (for producers to edit), whatever fits really.
I don't disagree with Jeff much but his XML bashing does agitate me. Ever since I've started poking XML with Saxon 9 - ie. with XPath & XSLT 2 coupled with XSD support - my interest has been reborn.
The trunk/tags/branches convention works for me when you give each developer a directory under - eg. branches/mattc - where they can do what they like, rather than it being a general free for all.
A working copy made from multiple parts of the repository. Possibly more sense to do this than force developers to know which parts of the repository they need to have to build the project.
Posted by mattc at May 1, 08 03:36 PM
... Comments (0)
"Cannot write an implicit result document if an explicit result document has been written to the same URI: file:/path/to/my/file.xml" at net.sf.saxon.Controller.checkImplicitResultTree
Odd error of the week. Ant (or Saxon) seems to run over everything in the basedir twice, and at the second pass creates the above error message. I'm using the following to run the transforms over an directory of xml documents,