Wednesday, February 28, 2007

Software system longevity paradigms

Software systems functionality often needs to survive changes like hardware and software failures, version upgrades, and system reimplementation.

There are two basic paradigms in providing this property: intentional and orthogonal. The differences between these two paradigms might look insignificant at first sight, but they lead to different properties of the products. I have an obvious bias in favor of intentional paradigm.

Saving State - Orthogonal Paradigm

In this paradigm the application state is saved. Saved artifacts are considered as based on the application. The saved state might be the entire application state or just a part of it. This functionality is often considered as orthogonal to other application functionality. A programmer working in this paradigm generally does not want to care if it is working with persistent objects or not. The features of persistence layer that leaks to application layer are often considered as sad facts of life and framework clutter.

The basic principle is that the entire software system survives because each application (or at least an important part of it) survives.

There are a lot of systems that use this paradigm.
  1. MS Office file formats

  2. OS hibernate

  3. Java object serialization

  4. CapROS

  5. JDO 1.0

  6. Most of object databases
Some of these technologies can be also used in the context of the intentional paradigm. But they are designed in the context of the orthogonal paradigm.

This paradigm has many features that make it attractive in the eyes of the developer. It is very simple to start using. An existing application object is just marked as persistent. So a developer does not have to learn a new technology.

The fundamental "feature" of this paradigm is that it assumes that the application does not change significantly during its life cycle. Problems surface when this assumption is invalidated. When the application evolves, it is hard to migrate data to a new version application. It is even harder to use data from other applications. Particularly if another programming language is used.

Problems of Java object serialization are relatively well documented. For example, almost each swing component JavaDoc file states that successful deserialization in other version of Java is unlikely.

JDO 1.0 changes semantics of Java objects using enhancer tool. This causes a number of interesting problems. On source level, objects look the same as they were before they we made persistent. Theoretically it could save a programmer from changing the code that had previously worked with these objects. However the behavior of the objects changes in the number of ways. Among other things, the tool replaces field access instructions with method calls. And if fields are public, it requires changes of persistent classes and client classes. The places that previously could not have been throwing an exception, now can throw an exception. Reflection stops to work because there are no more such fields in the class.

The problems of other products created in orthogonal persistence paradigm are also well documented. The hibernate functionality of operating systems possibly one of few places where the paradigm is applicable and quite natural. In this case it sometimes fails mostly because hardware and drivers are not quite ready for it.

Accessing Data - Intentional Paradigm

In this paradigm the application works with data that have life time longer than application's one. The application is considered to be dependent on the data rather than reverse as in orthogonal paradigm. This is paradigm looks more natural to me as the thing with shorter lifetime depends on the thing with longer lifetimes. Working with persistent data is considered as part of application functionality and constructs specific to persistence layers (like transactions) are considered as an essential part of application logic rather than unnecessary clutter.

The basic principle is that we ensure that the data survives, and an application is a transient thing anyway. It could die any time. Upon restart it will be able to work with the data again. Some data could be lost, but this is a known risk that should have been calculated.

There are a lot of systems that use this paradigm.
  1. Relational and SQL databases
  2. File systems
  3. OASIS Open Document Format
  4. Java XML Binding
Products created in the context of the intentional paradigm are usually a bit harder to use because the code that works with persistent state is aware of this fact. There is no attempt to create an illusion that these objects are just local to the application.

However, because of this, there are no problems related to the situation when an illusion breaks. The decisions related to persistence are explicit and they are part of the application logic.

If data is considered as separate thing from the application from the start, the features like upgradeability, support for multiple applications, and for legacy versions of the application come relatively naturally.

I think that the orthogonal paradigm is suitable only in cases of short lived data that can be written and read by the single version of the single application. There are quite a lot of such situations. But if such paradigms quickly fails in the case of modern EIS where applications are created, replaced (and often by rewriting them in another programming language), changed, and retired. And this is possibly one of most significant reasons why OODBMS (which are mostly developed in the context of the orthogonal paradigm) failed to overtake SQL databases (which are developed in the context of the intentional paradigm) in EIS.

Monday, February 26, 2007

Java 5 vs. .NET 2.0 generics: past vs. future

There is a thing that struck me out about Java 5 vs. .NET 2.0 generics.

Major design constraint for the generics in Java 5 was the backward compatibility. Much of the statically available information related to generics is not retained in the byte code. It is erased during a compilation process. The generics become sort of consistent with the rest of the language. There is even ability to retrofit old API to the new language features as it could be seen from the collection framework.

As result, the generics in Java 5 cause a kind of "fake" feeling. There are a lot of things that cannot be done using Java 5 generics which are desirable. For example, a instance of a generic class cannot learn about arguments with which it was created. Some awkward constructs like "((ArrayList<String>)(Object)(new ArrayList<Integer>())).add("Test")" do not create a runtime exception (the compiler honestly warns that there might be a problem).

However by doing so, Java was positioned as a "legacy" language where the burden of the past outweights needs of the future development.

In .NET 2.0, the generics were added as a new feature. It looks like the backward compatibility was not a major constraint during the development. It is a bit facilitated since new collections also implement untyped collections interfaces. However unlike Java 5 counterparts, .NET 2.0 generics do need invoking the backward compatibility argument for explaining weird features. They are much cleaner. So generics in .NET 2.0 looks like mostly motivated by their future use.

On other hand, it is puzzle to me why generics did not make into .NET 1.0. Lack of generics was well documented problem of Java. And .NET framework IS primarily inspired by Java (it was quite funny to read some MS documents that discuss C# and very carefully avoid mentioning Java and mentioning such remote languages as as C). And many fundamental Java language problems were fixed in .NET. This one belongs exquisite group of language problems that .NET designers chosen to reproduce. I kinda understand time-to-market arguments. But I do not believe that few additional months would have made a difference here. Many Java proposals for generics were on the table for long time, and it was possible to select a good one from the start. At least it would have saved them from the shame of CodeDom API.

So it looks like Java position itself as legacy language and .NET as an actively developed language runtime. It might be explained by the fact that the Java had much longer past than .NET. However it does not bode well for Java future. This is a possible reason why Sun started development of the new language called Fortress instead of adding needed features to Java. Open sourcing Java might break this self-image problem and cause implementing and trying more daring features in Java language, but I would not bet on it.