Wednesday, April 15, 2009
The Figaro XML database library is a data layer built for software developers.
For starters, your database runs as a collection of classes in your application – unless you’ve built your own layer somewhere else, there are no round trips anywhere outside your application domain; instead, your database layer is tightly bound to the software your software, and the hardware your application is running on. As you can imagine, this gives you, the software developer, the opportunity to create very high-performance applications. What is high performance? The Oracle website has a quote of transactions performed in the tens of thousands per second as their high-water benchmark; while we haven’t performed any official benchmarks as of yet, we feel comfortable enough to entertain the notion that the Figaro library can do at least half of what the raw C++ Oracle Berkeley DB API can do, or better. In its current implementation, Figaro is a thin layer between the .NET Framework and the Oracle Berkeley DB XML library; the C++ API is used underneath the covers to ensure we get the most out of the performance and functionality of the Oracle product.
Figaro is thin and small; over half of the whopping 12 MB install package size of is for documentation and utility applications. When Figaro loads into your application, you end up with about 5 megabytes of additional program size, hence the categorization of Oracle’s Berkeley DB product as an embedded database. I’d like to emphasize a point with this term; many people associate embedded with a product that runs (often exclusively) on a mobile or embedded platform, therefore concluding that it’s somehow not the same. If you’re more accustomed to relational databases, you may be familiar with watching these products shed a lot of weight as they try and squeeze themselves into your cell phones and other small devices. But what embedded really means to software developers when it comes to Figaro (and Berkeley DB) is that it’s consistently the same product on any platform it runs on. To borrow a phrase, it’s small enough for a mobile/embedded platform, but strong and powerful enough for a server platform. The Oracle Berkeley DB family of products can be found on everything from mobile phones to networking hardware to enterprise server solutions; remember, the Oracle Berkeley DB product family runs Amazon’s cloud data offering as well as several of Google’s online functionality (such as its user authentication store).
The key difference between Figaro as a data source and, say, your preferred relational database offering, is that most of the tasks that were automated or tucked away into the data access objects are now important pieces you have to consider when you develop your solution. It’s the price you pay as a software developer to get the specialization you require out of your data store. In other words, you still get all of the features you take for granted in your relational database – caching, logging, ACID transactions, in-memory databases – but to get all of these things, you have to build and maintain the code that makes your data access layer do your bidding.
Figaro follows the Oracle Berkeley DB family of products and offers licensing under the same available feature sets:
Data Store, or DS, encompasses the basics of data storage. The API is simple and uses direct access. This minimizes overhead when adding, removing, and updating data. Frequently accessed data is cached in memory. DS is very fast; it avoids storing data to disk until it is necessary. It performs no locking or other concurrency control, and it does not provide the ACID features of transactional systems. DS is simple and efficient data storage for non-concurrent applications that favor raw throughput over guaranteed data integrity.
Concurrent Data Store, or CDS, adds locking and concurrency management services to DS. With the CDS features enabled, CDS uses locks to manage multiple readers, and a single writer orchestrating their concurrent access to data within the databases. CDS is simple and efficient data storage for concurrent systems. ACID transactions are not supported.
Transactional Data Store, or TDS, adds support for fully ACID transactions to CDS. Changes to databases or their contents are atomic, consistent, isolated, and durable when using TDS. Transactions can be short lived or long running, lasting days if necessary. Transactions can improve system throughput in concurrent applications by grouping actions into a single commit to disk. Applications that use secondary indices to manage relationships require transactions to keep those relationships consistent. The transactional system offers a great deal of flexibility to accommodate your desired performance and durability requirements. For example, TDS supports the ability to trade off durability for speed, or to allow readers to see uncommitted data to reduce locking overhead. TDS utilizes log files to contain information about transactional data that can be used to recover from application or systems failure.
High Availability, or HA, adds support for replicated systems to TDS. There are the two basic reasons to build a replicated highly available system: scale and reliability. Scale your transactional application beyond the processing constraints of a single system. Deliver a highly reliable system where failure of one node does not cause system wide failure.
If you consider software complexity on a scale of 1 (okay, 0 in my hasty chart) to 10, I’ve taken the liberty of building a chart to help visualize the tradeoffs of using an API-based data access layer.
The key difference between each product feature is really a matter of specialization. When you add the .NET Framework, you get additional tools and Windows features to help you stretch the value of the feature set you choose. Currently our documentation more closely reflects the Berkeley DB product documentation, which does not make a clear distinction between the features with regard to Berkeley DB XML. As we solidify the API differences between the feature sets, we will provide more specific guidelines and documentation in order for developers to better understand the differences, and get the most out of each.
For more information, you can read the help documentation online at https://help.bdbxml.net.