Feed on
Posts
Comments

Reality Bites

I’m a software developer. I like abstractions, I enjoy mentally putting things in little boxes, structuring my mental model of the world, or at least the model of the problem I’m currently struggling with. I don’t know if it’s a requirement, but if you’re a developer, it certainly helps to be border-lining to obsessive about knowing how stuff works, in excruciating detail.

Having said that, I find it fascinating how problems in programming constantly make you hit your head against the wall of reality - the wall located where pretty, simple and linear abstractions meet the real world. You know, the real world which refuses to be dumbed down into simple rules, and even when you think you’ve trapped it with you’re rules, always breaks free with new twists and turns that you hadn’t thought of. This is sort of the essence - problems that I’ve been working with more or less since I started programming, problems which also appear very benign from a casual viewpoint, that keep coming back, and simply refuse to be solved in a way such that you can put them behind you.

Time is the first and most obvious case. That something as fundamental and trivial (you learn to read the watch when you’re what? Five? Six?) can be so complicated and easy to get wrong is truly fascinating. And then I haven’t even mentioned dates yet. Just starting to think about dates will get any programmer into trouble. Before you know it, you’re knee deep in Gregorian calendars and leap years - and even then, you won’t get it right. In fact, I’d go so far as saying that any non-trivial date/time calculation in software will contain at least one bug, or one special case that you hadn’t really thought of.

Once again, then I haven’t even mentioned time zones yet. Time zones, combined with daylight savings time, is the real killer, if you by pure miracle got the first date calculations right. At the company were I worked before, we had a standing joke two times a year - was this going to be the first day lights savings switch in the company’s history where we didn’t encounter a bug related to it? From what I can recall, we never had a bug free switch (in five years!).

Maps, or perhaps rather geographic locations, is the second thing that springs to mind as a seemingly straightforward thing, that just never ever gets right.

From the day we learned how to use a two dimensional map, I actually think we’re doomed into living in the wrong paradigm. In grade school we learn that north is up on the map, and using a ruler to measure distances on it is the way to go. Sure, both work. Sometimes. But equally often, it just almost works, it’s almost true, kind of. “Almost true”, deeply ingrained in your mind, happens to be a perfect and never ending source of interesting and subtle software bugs.

Add to that the fact that the earth isn’t really a sphere, but almost, and while we have attempted to meaure its non-spheriness, we came up with several conflicting measurements, all giving slightly different results when you try to express where you are on this not-really-a-sphere thing. Sometimes, different measurements are mixed, and you will have to use completely non-trivial transformations to convert from one to the other, but those transformations only work for very special circumstances.

Even if you get it right, someone will probably want to calculate the distance from where you think you are to some other point, which isn’t at all trivial on a sphere. But it wasn’t a sphere, right? Yep, right - not a sphere, and that makes it just even worse. Luckily, I (and probably not you) don’t need the last kind of precision very often, but that doesn’t help, because we will still try to use a ruler (or its digital equivalent) to measure the distance on a map - which won’t work, even if the teacher in fourth grade said it would.

Character encoding is my last example. Yes, getting characters to show up on your screen, more or less. Again, something very mundane and something a non-developer would take for granted. And yet, this is something that has been a problem in just about any application I’ve written, or seen being written, or used, for I don’t know how long. I guess being a swede, using the funny å, ä, ö characters a lot (or “Ã¥, ä and ö”, as I’ve learnt to know them from years of UTF-8/ISO-8859 mangling), doesn’t help, since you’re so much more exposed to the problem.

Closing up, I want to attempt finding something that these three problems have in common, something that make seemingly simple things so very complicated. My first guess would be that its the perceived simplicity that is the core in all three - all are stuff that we in our daily lives take for granted, talk about while at the same time not paying much attention to the details: we look at our watches, make appointments and write them in our calendars, decide to meet at certain locations and talk about how far it is to places. We never ever think about the underlying details and complexities while doing this, it’s all very intuitive to us. On the other hand, I think it’s this very intuition that trips us when developing software; the fact that we think we know this very well gets in the way details (that most of us actually don’t know).

Also, all three share that they in some way involve problems regarding frame of reference: time zone and daylight savings time problems are hard since you very easily get confused about what is fixed and what is relative. The map problems are very much the same - what is absolute in one frame of reference (”up”, “north”, “distance”, etc.), does not necessarily translate to some absolute in another, and what you’re doing is switching frame of reference, if it’s going from a projected map to WGS84, or if it’s transforming between two geodetic references. Character encoding, and translation from one encoding to another, is also only meaningful if you keep track of your frame or reference (”current encoding”) through every step of the process - something which is very obviously much easier said than done.

At first, I also thought about adding the theory of relativity to the problems above. I decided against it, since it’s really not something you deal with every day, and it’s also far beyond my field of expertise. Interestingly enough though, it also talks about things we perceive as intuitive (time, length and weight being more or less constant) and turns it on its head, although more profoundly than the examples above. Also, the solution very much lies in keeping track of your frame of reference.

This is also the conclusion: a perceived intuitivity about a problem, combined with transitions between different frames of reference, makes for a problem that you will come back to again and again.

This entry will be in english, since it’s more likely to be helpful to others googling for solutions to the same problem.

A colleague of mine has used a D-Link DNS-323 as his RAID/backup solution at home. Apparently it’s been working great for ages, until he recently updated its firmware, which also caused all files and directories containing non-ASCII characters (mostly å, ä and ö for us swedes) to be completely inaccessible; the windows sharing (Samba/SMB) wouldn’t show the directories at all, and although they showed up in FTP, you couldn’t really access them. Downgrading the firmware did not help.

The fun thing about the DNS-323, and the reason the colleague asked me for help, is that it runs Linux internally. Although it looks like a USB disk with a web interface, it actually has a full-featured operating system underneath. Well, close to full featured.

Googling for solutions, I found another swede, Martin Bergek, who had at least similar problems (by chance, I also happen to know who Martin is). It seems that older firmwares used CP850 for filename encoding, while newer versions use UTF-8. Probably some upgrade didn’t go as planned, leaving the filenames in CP850, while interpreting them as UTF-8. Decoding swedish CP850 characters as UTF-8 results in invalid multibyte characters, causing the programs to refuse to handle files and directories containing them.

Now for the fun part. There seems to be quite a community developing hacks for the DNS-323. Using the so called fun_plug, you can very easily enable an SSH server, and get access to lots of useful commands. In my case, SSH access in combination with the rsync command turned out to be the key.

My solution was to get a backup disk large enough to copy all the material from the DNS-323 (actually, my colleague had already thought of this and provided me with a disk for this purpose). Once all the files where copied from the DNS-323, it could be wiped and the files copied back, but this time with the correct filename encoding.

As mentioned above, most of the problem results from applications interpreting the characters as invalid UTF-8 codes, refusing to work with them altogether. Even basic stuff, like doing recursive file lists with rsync, fails if a directory contains the invalid characters, probably since the runtime’s string library can’t work with the resulting strings. Shell commands like cd handle the directories, though, even if the characters are displayed as garbage.

Fortunately, rsync includes a command line parameter called --iconv, which lets you override which encoding is used when interpreting the filenames. This way, you can interpret the names correctly, and they can be converted into proper UTF-8. The trick is that you have to do this on the DNS-323 side, otherwise the conversion will be done on the backup unit’s side, still causing errors attempting to do recursive file lists. (In case you connect the USB disk directly to the DNS-323, you won’t have to think about this - in my case, the backup disk was connected to my Ubuntu desktop.)

So, to sum things up, this was the killer command line that made it possible to copy all the files out of the DNS-323:

rsync -azv --iconv=cp850,utf8 /mnt/HD_a2/ per@asta:/media/usbdrive/dns323

(Obviously, you’ll have to replace the paths to whatever directories you’re using on the DNS-323 as well as for the backup disk.)

Grotesque

Som jag skrivit i en tidigare post har jag skrivit ett program för katalogisering av IF-spel. För att göra det hela lite mer officiellt har jag nu registrerat projektet på SourceForge, som nu tar hand om Grotesques hemsida samt tar hand om projektets programkod.

Spielen!

Stora nyheter. Jag har lyckats sätta ihop mitt första GTK+-program och skapat mitt första Ubuntu-paket (som innehåller programmet i fråga). Men låt oss börja från början.

De senaste månaderna har jag fastnat i textäventyr - den uråldriga eller kanske urmodiga typ av datorspel där man spelar genom att läsa text och sedan skriva kommandon för att berätta för datorn vad man vill göra; interactive fiction (IF) är den engelska termen, som jag gillar bättre, då det inte nödvändigtvis rör sig om äventyr av drakar och demoner-karaktär.

I detta äventyrande började mina problem. Annars förträffliga Ubuntu har inte några program för att spela IF på något bekvämt sätt; några program finns (frotz, tads och qtads), men de täcker inte alla olika spelformat som finns, och även om jag gillar terminalfönster så är det inte optimalt för IF. Min favorit blev istället snart Gargoyle, som stöder alla format jag är intresserad av, samt ser synnerligen elegant ut.

Vad Gargoyle saknar är dock ett bibliotek, eller smidigt sätt att hålla reda på de olika spel man samlar på sig. Det är här mitt program kommer in. Ett slags katalogiseringsprogram för IF-spel med utseende som ett budget-iTunes. Arbetsnamnet är Grotesque (eftersom en gargoyle är en form av groteskornamentik - ja, jag fick slå upp det sistnämnda ordet).

Jag ämnar nog göra något lite mer seriöst av detta längre fram, men tills vidare lägger jag upp en screenshot, källkoden samt slutligen Ubuntu-paketet för Grotesque-0.1 (Uppdaterat 2008-07-03: se Grotesques hemsida för senaste versionen av dessa filer). Eftersom jag bara byggt på min egen maskin och det är första gången jag paketerar för Ubuntu så har jag säkert gjort något fel, men pröva gärna.

Min plan härnäst är att paketera även Gargoyle som Ubuntu-paket, eftersom det är lite väl stökigt att bygga för hand nu, lägga på lite puts och glans på Grotesque, samt kanske hitta på ett ordentligt namn. Vi får se. Under tiden borde du pröva att spela lite IF - några bra förslag på spel att pröva hittar du t.ex. här.

Kvitter

Här verkar det inte hända mycket, men om ni är väldigt intresserad av vad jag håller på med kan ni numera försöka följa mig på Twitter.

Older Posts »