Skip to main content

XML versus Binary Document Format Debates

Responding to Sean McGrath on his recent post about XML-vs-binary document formats:

Look, its just like I was trying to say in #python@irc.freenode.net the other day: There is nothing wrong with a format being binary. There is no virtue to be found in every byte of a file being interpreted as a textual character (or part of one) that represents your real data. "There ain't such thing as plain text," says Joel Spolsky! There isn't any difference in interpresting the binary as text than just interpretting it directly as your data.

You can easily have XML formats as undocumented and inconsistant between versions as any binary format, but you get the added benefit of extra processing overhead, bloated filesizes, and limitations on structure and performance (try keeping efficient on-disk indexes into an XML file up-to-date).

I do believe text-based formats and XML has its place, but these places are limited. I would have much perfered an opening and standardization of a relational-based format the way Word documents worked internally before Microsoft was bullied into an XML-based format.

I'm wondering if I actually know Sean and didn't realize it, because his points are exactly those I was arguing against in #python the other day, and so I wonder if someone I was arguing was McGrath by a different name. Knowing your strangers is a great gem of the internet-age, isn't it?

Comments

Anonymous said…
*Documents and/or Markup*

Step 1) Learn stream compression in Python via built-ins "zlib" or "bz2"
Step 2) Learn the subset of XLM that is Elementtree, use "cElementtree"
Step 3) There is no step 3

*Asynchronous Messaging*

Step 1) Learn json, use "simplejson"
Step 2) There is no step 2

*Configuration and Customization*

Step 1) Use a Python module
Step 2) There is no step 2

I started programming in 1982, when I was 11 years old. I wrote my last binary file in 1988, when I was 17 years old. The tools have greatly improved since then.

There are people who think there is more to say about these topics. I had my last debate about binary files when I was 17.5 years old.

Cheers, and all the best with your current and future projects.

moe@manuelmgarcia.com

Popular posts from this blog

My Software Job Transition Strategies?

I’ve been spending a good deal of the last two days preparing mentally for starting a whole new challenge as a developer. New things aren’t new to me, but this is different and big enough really call for some Deep Thoughts ™. For one thing, I’ve made a big move from the world of Python web development to totally other Python work and while web development has never been the only thing I do, it has been the only work that paid the bills. That transition isn’t one that bothers me or daunts me, though. Instead, I’m thinking about transitioning to the scope of the work I’m getting into. For a long time, I juggled multiple clients and client projects every day, so no single project usually took up most of my time. Every developer juggles time through the day, but exactly how that works in each company and on each project varies a lot. I was looking for a place that I could really focus in a way that I haven’t for a long time. I think I found that, but now I have to deal with the consequen...

CARDIAC: The Cardboard Computer

I am just so excited about this. CARDIAC. The Cardboard Computer. How cool is that? This piece of history is amazing and better than that: it is extremely accessible. This fantastic design was built in 1969 by David Hagelbarger at Bell Labs to explain what computers were to those who would otherwise have no exposure to them. Miraculously, the CARDIAC (CARDboard Interactive Aid to Computation) was able to actually function as a slow and rudimentary computer.  One of the most fascinating aspects of this gem is that at the time of its publication the scope it was able to demonstrate was actually useful in explaining what a computer was. Could you imagine trying to explain computers today with anything close to the CARDIAC? It had 100 memory locations and only ten instructions. The memory held signed 3-digit numbers (-999 through 999) and instructions could be encoded such that the first digit was the instruction and the second two digits were the address of memory to operat...

The Snake Pit is About to Burst

The signs are all over the place. I can count at least five implementations of Python today: CPython, CL-Python, Jython, IronPython, and PyPy. The use of the language is sky rocketting and set to grab real mind-share as the hype over Ruby subsides. Things are looking good for a favorite green snake and british comedy troop reference, aren't they? Trouble is on the horizon in the very ingredients that could push us into true success. Our community and our very language is in danger of segregation, unless we all do something about it and learn to get along. One of the most visible dangers (to me) is being ignored for various political, cultural, and non-technical reasons. IronPython's users are increasingly pushing IronPython-only recipes, libraries, and tutorials. No one is talking about the transition of the alternative implemenations to CPython 3.0 compatability. To make matters worse, we still can not define the language without refering to an implementation. This is very un...