Skip to main content

How to Understand AppEngine Datastore Under the Hood: Part 1 - An Overview of the Underview

There are a lot of wrong perceptions about the datastore in Google AppEngine. People both familiar and foreign with AppEngine don't really understand what the datastore is. There is a deeper system underneath the nice API we are given. Understanding the guts can help us understand the skin. We may also find there are times when we must shed the skin for new clothing.

The biggest misconception about the datastore is the assumption that "kinds" are anything like "tables". You could use a set of entity kinds similar to the way you would use a set of tables, but they simply are different beasts, entirely. A table controls a strict requirement on the structure of its rows. Every entity, on the other hand, is free to hold any properties of allowed types. The published Model API is all an abstraction provided to give us a nice interface on top of an otherwise much looser foundation.

Many people would be very surprised to learn that a given kind doesn't actually require anything of its entities, but from the right angle it makes perfect sense. Meeting the kind of scalability requirements the datastore is designed for places interesting limitations. Schema changes can't get in the way when you could have such a large dataset that no operation can ever effectively operate on the entire set at once. This means what was a simple matter of ALTER TABLE in SQL is practically impossible in this new world, as the logistics behind updating and migrating potentially millions of entities to a new schema grows beyond the acceptable resources to give to a schema change. However, if we allow flexibility, we simply start creating new entities in the updated form and be sure that when we load one of the previous versions, we're prepared to use or upgrade it on the spot. For this and other reasons, allowing all entities to be free-form is the simplest direction to provide the foundation we need.

With a better understanding of our foundation we can better understand the abstractions in google.api.ext.db, with the Model subclasses most AppEngine developers know. I've seen quite a few people asking about migrating to changes in their db.Model subclasses, not understanding why or how their existing entities will change to match the newly defined properties. The behavior and how to work with it is a lot easier to understand when you view the individual entities are independent property bags, and not rows following a defined column schematic. We can also come to understand db.Expando as closer to the wire, so to speak, than its stricter Model cousin.

Perhaps a more exciting gain from this different view of the datastore is that we aren't bound by the published Model-centric API at all. In fact, we can access the underlying Entity class directly, providing us with a simple, persisted mapping object, without anything building on top of it. If we need some structure to our persistence, but the provided API simply isn't to taste, then an understanding of this layer gives us what we need to build our own variant datastore API. We may even use this understand to provide implementations compatible with previous ORM solutions, but powered by the entities and BigTable, rather than traditional SQL databases. The possibilities open up with our deeper understanding.

The more variation we have in what everyone is doing on AppEngine, the more value it has to all of us. Take this information and do some exciting. Share it and we're all reap the benefits.

Look for Part 2: The Raw Datastore API

Please vote on Reddit and/or Digg this article.

Comments

Popular posts from this blog

CARDIAC: The Cardboard Computer

I am just so excited about this. CARDIAC. The Cardboard Computer. How cool is that? This piece of history is amazing and better than that: it is extremely accessible. This fantastic design was built in 1969 by David Hagelbarger at Bell Labs to explain what computers were to those who would otherwise have no exposure to them. Miraculously, the CARDIAC (CARDboard Interactive Aid to Computation) was able to actually function as a slow and rudimentary computer.  One of the most fascinating aspects of this gem is that at the time of its publication the scope it was able to demonstrate was actually useful in explaining what a computer was. Could you imagine trying to explain computers today with anything close to the CARDIAC? It had 100 memory locations and only ten instructions. The memory held signed 3-digit numbers (-999 through 999) and instructions could be encoded such that the first digit was the instruction and the second two digits were the address of memory to operat...

Statement Functions

At a small suggestion in #python, I wrote up a simple module that allows the use of many python statements in places requiring statements. This post serves as the announcement and documentation. You can find the release here . The pattern is the statement's keyword appended with a single underscore, so the first, of course, is print_. The example writes 'some+text' to an IOString for a URL query string. This mostly follows what it seems the print function will be in py3k. print_("some", "text", outfile=query_iostring, sep="+", end="") An obvious second choice was to wrap if statements. They take a condition value, and expect a truth value or callback an an optional else value or callback. Values and callbacks are named if_true, cb_true, if_false, and cb_false. if_(raw_input("Continue?")=="Y", cb_true=play_game, cb_false=quit) Of course, often your else might be an error case, so raising an exception could be useful...

Announcing Feet, a Python Runner

I've been working on a problem that's bugged me for about as long as I've used Python and I want to announce my stab at a solution, finally! I've been working on the problem of "How do i get this little thing I made to my friend so they can try it out?" Python is great. Python is especially a great language to get started in, when you don't know a lot about software development, and probably don't even know a lot about computers in general. Yes, Python has a lot of options for tackling some of these distribution problems for games and apps. Py2EXE was an early option, PyInstaller is very popular now, and PyOxide is an interesting recent entry. These can be great options, but they didn't fit the kind of use case and experience that made sense to me. I'd never really been about to put my finger on it, until earlier this year: Python needs LÖVE . LÖVE, also known as "Love 2D", is a game engine that makes it super easy to build ...