DevPinoy.org
A Filipino Developers Community
   
Introduction to Object Databases (using db4o)

For the longest time I have been intrigued by object database management systems (ODBMS) and I have been wanting to try it out. Of course there are the usual excuses -- I'm busy, I've got no time, I've got so many other projects, etc. Recently, my wife decided to restart her business, this time in Singapore, and I volunteered to make her some custom software (and earn "pogi" points in the process). I also thought it would be a good opportunity to learn WPF and other "new" (to me) stuff. So finally I had the motivation and opportunity to break through those excuses and take a long, hard look at using an object database.

I chose the object database package that I first encountered off a Google search -- db4objects' db4o. It has versions for both Java and .NET. Being a .NET guy, I quickly downloaded and installed the latest version for .NET 3.5. It comes with a straightforward tutorial on how to use the db4o API and I was pleasantly surprised how easy it is to use. It is dual-licenced, using open source GPL for open source and "in-house" projects and a commercial licence if you use db4o in a non-GPL commercial product.

Adjusting to ODBMS Concepts

There are a few adjustments one has to make, moving from the more common relational database management systems (RDBMS) to ODBMS. In my case, I guess I've been so brainwashed (for the past decade or so) to think in terms of tables and rows that it was mind-blowing for me to realise that with an ODBMS my entities do not need to have an ID field! That's right, say goodbye to the primary key! To illustrate this, let's cite an example:

  1. public class Student {
  2.     public int Id { get; set; }
  3.     public string LastName { get; set; }
  4.     public string FirstName { get; set; }
  5.     public Class[] Classes { get; set; }
  6.     public decimal Gpa { get; set; }
  7. }

Here is a Student class describing a student. A Student will contain a collection (array) of Classes, aside from other fields. I would like to map this class neatly to a Student database table, so note the presence of the Id property here. The Id property holds a useful purpose for me -- it tells me whether I have saved the object instance to the database or not. If the Id field is zero (I usually have numeric primary keys, which are auto-generated and incremented by the underlying database system) I know that it hasn't been saved to the database yet. Therefore a save operation on this instance would result in a SQL INSERT statement; for other instances wherein the Id field is non-zero it would result in a SQL UPDATE statement. The following shows the class declaration for a Class:

  1. public class Class {
  2.     public int Id { get; set; }
  3.     public Subject Subject { get; set; }
  4.     public DateTime Time { get; set; }
  5.     public TimeSpan Duration { get; set; }
  6.     public Teacher Professor { get; set; }
  7. }

With a RDBMS, we might have the following relationship diagram:

Relating the Student and Class tables is a "StudentClass" table, with foreign keys to the related tables. Therefore if I have a Student named "John" with Id = 1 and I want to retrieve the Classes he is currently taking up, I can get that by doing a SELECT joining Class and StudentClass, where StudentClass.StudentId = 1. Similarly if I have a Student named "Mary" with Id = 2 and I want to retrieve her Classes, I can do a SELECT joining Class and StudentClass, where StudentClass.StudentId = 2. If I have a Class for an "Accounting" subject (for simplicity we do not show the "Subject" and "Professor" tables) and that particular class has an Id = 1001, I can retrieve all students in that class by executing a SELECT joining Student and StudentClass, where StudentClass.ClassId = 1001. The primary and foreign keys are essential to the relational database concept.

So how does this scenario work with an object database? First off we just eliminate the Id fields in the Student and Class classes. Let us pretend that we have three Class instances: one for an "Accounting" subject, one for "Typing" and one for "Swimming". (Again for simplicity we do not show the Subject and Teacher classes anymore.) If student "John" has two Classes -- one for "Accounting" and another for "Swimming" -- the object graph would look like this:

Similarly if student "Mary" has also two Classes -- "Accounting" and "Typing" -- her object graph would look like this:

So far so good. You would notice that "Mary" and "John" share one thing in common: a Class for "Accounting." In an object database, there will only be one instance of this "Accounting" Class and both "John" and "Mary" will be referring to it. If that is still not clear, let us do the talking in code. To simplify things, let's change the Class declaration to:

  1. public class Class {
  2.     public string Subject { get; set; }
  3.     public DateTime Time { get; set; }
  4.     public TimeSpan Duration { get; set; }
  5.     public string Professor { get; set; }
  6. }

This does away with the need for us to have Subject and Teacher classes, as well as the Id field. Let us create three instances of Class subjects:

  1. Class accounting = new Class()
  2.                        {
  3.                            Duration = new TimeSpan(1, 0, 0),
  4.                            Professor = "Professor X",
  5.                            Subject = "Accounting",
  6.                            Time = DateTime.MinValue.Add(new TimeSpan(9, 0, 0))
  7.                        };
  8. Class swimming = new Class()
  9.                        {
  10.                            Duration = new TimeSpan(2, 0, 0),
  11.                            Professor = "Michael Phelps",
  12.                            Subject = "Swimming",
  13.                            Time = DateTime.MinValue.Add(new TimeSpan(12, 0, 0))
  14.                        };
  15. Class typing = new Class()
  16.                        {
  17.                            Duration = new TimeSpan(1, 0, 0),
  18.                            Professor = "Prof. Dvorak",
  19.                            Subject = "Typing",
  20.                            Time = DateTime.MinValue.Add(new TimeSpan(8, 0, 0))
  21.                        }

And the two Students:

  1. Student john = new Student()
  2.                    {
  3.                        LastName = "John",
  4.                        FirstName = "John",
  5.                        Gpa = 3.5M,
  6.                        Classes = new[] {accounting, swimming}
  7.                    };
  8. Student mary = new Student()
  9.                    {
  10.                        LastName = "Mary",
  11.                        FirstName = "Mary",
  12.                        Gpa = 3.9M,
  13.                        Classes = new[] {accounting, typing}
  14.                    };

Note that we also removed the Id field in the Student class. Now we save them off to the database:

  1. IObjectContainer db = Db4oFactory.OpenFile(Proj.DatabaseFile);
  2. db.Store(accounting);
  3. db.Store(swimming);
  4. db.Store(typing);
  5. db.Store(john);
  6. db.Store(mary);
  7. db.Close();

Notice how easy it is -- look ma, no SQL! By the way, the IObjectContainer and Db4oFactory are declared in the Db4objects.Db4o namespace, and you need to add a reference to the Db4objects.Db4o.dll that comes with the MSI package. Proj.DatabaseFile is just a string I put in my project, which contains the path to the db4o database file that will be used (and created, if it doesn't exist yet), e.g. "C:\TestData\testdb.db4o". Run this code so we can have a database to play with, then do the following in order to query our Class objects inside the object database:

  1. IObjectContainer db = Db4oFactory.OpenFile(Proj.DatabaseFile);
  2. IList<Class> classes = db.Query<Class>();
  3. foreach (Class item in classes)
  4. {
  5.     Console.WriteLine("Got class {0}, taught by {1}", item.Subject, item.Professor);
  6. }
  7. db.Close();

You should get the following output:

Got class Accounting, taught by Professor X
Got class Swimming, taught by Michael Phelps
Got class Typing, taught by Prof. Dvorak

If you want to restrict your query using criteria (similar to the WHERE clause in SQL), you can specify them using lambdas:

IList<Class> matchingClasses = db.Query<Class>(c => c.Subject.Equals("Accounting"));

This way you only get Class instances whose Subject equals the string "Accounting". Obviously there's only one instance like that in our current database. You can combine the output of the Query() method with LINQ's FirstOrDefault() method (applied to IEnumerable<T>) to get exactly one instance, or null if there is no result:

Class found = db.Query<Class>(c => c.Subject.Equals("Accounting")).FirstOrDefault();

You might have noticed that we always have this reference to IObjectContainer -- the db variable -- in the code above. It is important that you keep the IObjectContainer reference open, otherwise the database will not be able to correctly track the identity of the objects. For example, let us retrieve both John and Mary from the database:

  1. IObjectContainer db = Db4oFactory.OpenFile(Proj.DatabaseFile);
  2. Student john = db.Query<Student>(s => s.FirstName.Equals("John")).FirstOrDefault();
  3. Student mary = db.Query<Student>(s => s.FirstName.Equals("Mary")).FirstOrDefault();
  4. Console.WriteLine("John has:");
  5. foreach (Class c in john.Classes)
  6. {
  7.     Console.WriteLine("...{0} taught by {1}", c.Subject, c.Professor);
  8. }
  9. Console.WriteLine("while Mary has:");
  10. foreach (Class c in mary.Classes)
  11. {
  12.     Console.WriteLine("...{0} taught by {1}", c.Subject, c.Professor);
  13. }
  14. Class johnAccounting = john.Classes.Where(c => c.Subject.Equals("Accounting")).FirstOrDefault();
  15. Class maryAccounting = mary.Classes.Where(c => c.Subject.Equals("Accounting")).FirstOrDefault();
  16. Console.WriteLine("John and Mary's accounting classes are the same instance: {0}", johnAccounting == maryAccounting);
  17. db.Close();

The query automatically returns the object graph for the Student instances -- which contain references to Class instances. If you run the above example, the program will tell you that John and Mary actually share only one instance of the accounting class:

John has:
...Accounting taught by Professor X
...Swimming taught by Michael Phelps
while Mary has:
...Accounting taught by Professor X
...Typing taught by Prof. Dvorak
John and Mary's accounting classes are the same instance: True

Now what happens if you close the IObjectContainer and try to re-open it? Let's try the following:

  1. IObjectContainer db = Db4oFactory.OpenFile(Proj.DatabaseFile);
  2. Student john = db.Query<Student>(s => s.FirstName.Equals("John")).FirstOrDefault();
  3. db.Close();
  4. db = Db4oFactory.OpenFile(Proj.DatabaseFile);
  5. Student mary = db.Query<Student>(s => s.FirstName.Equals("Mary")).FirstOrDefault();
  6. db.Close();
  7. Console.WriteLine("John has:");
  8. foreach (Class c in john.Classes)
  9. {
  10.     Console.WriteLine("...{0} taught by {1}", c.Subject, c.Professor);
  11. }
  12. Console.WriteLine("while Mary has:");
  13. foreach (Class c in mary.Classes)
  14. {
  15.     Console.WriteLine("...{0} taught by {1}", c.Subject, c.Professor);
  16. }
  17. Class johnAccounting = john.Classes.Where(c => c.Subject.Equals("Accounting")).FirstOrDefault();
  18. Class maryAccounting = mary.Classes.Where(c => c.Subject.Equals("Accounting")).FirstOrDefault();
  19. Console.WriteLine("John and Mary's accounting classes are the same instance: {0}", johnAccounting == maryAccounting);

Running the code, the output this time will be:

John has:
...Accounting taught by Professor X
...Swimming taught by Michael Phelps
while Mary has:
...Accounting taught by Professor X
...Typing taught by Prof. Dvorak
John and Mary's accounting classes are the same instance: False

As you can see, the moment you close the IObjectContainer, it loses track of objects that have been retrieved from the database. If you are not careful, this might cause you to save duplicate objects, and you'll have great difficulty tracking their object graphs.

More To Come!

We've barely touched the tip of the iceberg when it comes to object databases. In the meantime, I would encourage you to try out db4o, or some object database platform of your choice. I'm sure you'll find that it's a welcome relief to be set free from SQL. In the coming blog posts I'll be discussing how you can do client-server database connections with db4o, transactions and controlling the depth of traversal of object graphs when retrieving and saving objects. We will also discuss the relative advantages and disadvantages of ODBMS vis-a-vis the traditional RDBMS. Stay tuned!


Posted 04-22-2009 11:12 PM by cruizer
Filed under: , , ,

Comments

jakelite wrote re: Introduction to Object Databases (using db4o)
on 04-23-2009 7:02 AM

another great post! keep it coming...

jokiz wrote re: Introduction to Object Databases (using db4o)
on 04-23-2009 7:32 AM

so this is db4o, while reading, i can't get NH out of my mind (some similarities i guess since it was with NH that i first bumped into such constructs in c# code like IQuery), i guess i missed using it.  good post, can't wait to see all the issues that you will encounter, ;p

cruizer wrote re: Introduction to Object Databases (using db4o)
on 04-23-2009 5:05 PM

well NHibernate is an O/RM so it tries to appear like an object database to its client (the programmer), but underneath it really is not. db4o is a true object database, no SQL at all :)

jakelite wrote re: Introduction to Object Databases (using db4o)
on 04-23-2009 7:36 PM

db4o supports both java and .net right? im just curious if they have compatible storage formats. that is one can open a db created by .net from java and the reverse.

cruizer wrote re: Introduction to Object Databases (using db4o)
on 04-23-2009 8:51 PM

I think it should work...


Copyright DevPinoy 2005-2008