Why CORBA and other forms of RPC are bad

Abstract

I personally think that all forms of RPC are way over-hyped, and not actually terribly useful in the majority of situations in which you'd be tempted to use them. I know this is an unusual point of view, and needs justification.

Background

As a broad outline, you can plot different forms of IPC on a graph with one axis being speed, and the other coupling or, it's inverse, portability. Speed and coupling, like space and time complexities in algorithms, can be related, but, IMHO, are largely orthogontal.

How the axis of the graph are related to real-world protocols.

For example, one naive form of IPC that I've seen commonly is for raw memory to be dumped onto a socket or a pipe. This is great for speed, but very bad for coupling (i.e. very non-portable). It might not even work for two different compilers on the same platform, and if you're trying to communicate between two different languages, each program needs to know intimate details about how the other lays things out in memory. Also, naively implemented, with little thought given to minimizing round-trip requests, it often isn't as fast as you'd think. (More on that later).

At the opposite corner in the graph are protocols like SMTP, or even worse, XML over HTTP. SMTP communicates using text messages that have to be parsed. SMTP parsing is pretty simple, although, looking at sendmail, it can get pretty hairy. XML parsing is even more complex. These parsing steps slow things way down. On the other hand, the well defined nature and robust structure of these protocols makes them extremely portable (i.e. low coupling).

So, where does RPC fit into all of this?

In my opinion, CORBA, and other RPC solutions are medium to highly coupled, and medium to low speed. Not as bad on speed as, perhaps, XML, but not all that great either.

Justification

Medium to high coupling

Well, if you think about it, all forms of RPC, including CORBA, require a fairly explicit detailing of the messages to be sent back and forth. You have to specify your function signature pretty exactly, and the other side has to agree with you. Also, the way RPC encourages you to design protocols tends to create protocols in which the messages have a pretty specific meaning. In contrast, the messages in the SMTP or XML protocol have a pretty general meaning, and it's up to the program to interpret what they mean for it. This bumps RPC protocols up on the coupling scale a fair amount, despite the claims of the theorists and marketers.

OTOH, the messages are platform and language independent. It's relatively easy to find a binding for any given language. Usually the code to decode the message from a bytestream and call your function is generated for you. And, the bytestream will be the same if you get your message from a perl program or a C++ program. This bumps it down on the coupling scale from the memory dumping protocols.

Medium to low speed

Marshalling

This one is minor in that almost any protocol that is supposed to be language and platform independent is going to have it. The problem is marshalling. You have to get your data from the format your language keeps it in into your wire format and back again. While this problem is not as computationally intensive as parsing, it still exists.

Overly synchronous

The other is major. The one thing you always want to avoid in any networking protocol is round trip requests. Round trip requests will always be inherently slow, if for no other reason than the speed of light. You never want to wait for your message to be processed, and a reply sent before you send another message. The X protocol works largely because it avoids round-trip requests like the plague. There is a noticeable lag when an X program starts up because the majority of the unavoidable round trip requests (for things like font information and a bunch of window ids and stuff) are made then. Even when the programs are on the same computer, round trip requests force context switches which eat CPU time. In short, they are BAD.

Any RPC oriented protocol encourages you to think of the messages you are sending as function calls. In every widely used programming language, functions calls are synchronous. Your program 'waits' for the result of a function call before continuing. This encourages thinking about your RPC interface in entirely the wrong way. It doesn't make you focus on the messages you're sending, and the inherently asynchronous nature of such things. It makes you focus on the interface as you would a normal class interface, or a subsystem interface. This leads to lots of round-trip requests, which is a major problem.

CORBA generally advocates soloving this problem by making heavy use of threading. But multiple threads have a lot of problems. They just move the descisions about handling asynchronous behavior to the most difficult part of your program to deal with it in, your internal data structures. Not only that, but multiple threads are a big robustness problem. It's difficult to deal with the failure of a single thread of a multi-threaded program in an intelligent way, especially if the job of that thread is intimately intertwined with what another thread is doing.

Also, threads end up taking up a lot of resources. Both obvious ones like context switches and space for processor contexts, and in less obvious ones, like stack space. One program I saw had a thread for every communcations stream, and it needed to deal with over 300 streams at a time. It also needed a stack of 1M for each thread because some function calls ended up being deeply nested. That's at least 300M just for stack space. The program may have had no difficulty dealing with 300 communications streams at once had it not used threads. As it was, it constantly ran out of memory.

Conclusions

In short, CORBA, and other RPC solutions look like a quick and easy answer to a difficult problem, but, measured on the coupling and speed scales, they are medium to highly coupled, and medium to low speed. Not as bad on speed as, perhaps, XML, but not all that great either.

Grand dreams of utopia

I would like to see (and am in the process of creating) a very message oriented tool for building communications protocols. It would concentrate heavily on what data the messages contained, not what the expected behavior of a program receiving the message should be. It would be supported by a communcations subsystem that emphasized the inherently asynchronous nature of IPC, and made it easier to build systems that used that to provide efficient communcations. It would provide auto-generated marshalling and unmarshalling of data, and even a provision for fetching metadata describing unkown messages, much like XML. It would also allow you to easily override the generated functions with your own functions that are tuned to the data you're sending. It would also make it easy to build and layer new protocols on top of existing ones, or transparently extend protocols in such a way that old programs would still work.

As a last note, I would like to say that we've known for years how to handle the inherently asynchronous nature of user interaction. All of our UI libraries are heavily event oriented. Processing is either split up into small discreet chunks that can be handled quickly inside an event handler, or split off into a seperate thread that communicates to the main program mainly through events. We need the same kind of architecture for programs that do IPC.

If anybody is interested in the beginnings of such a system, I have an open source project that I think is still in the 'cathedral' stage. I would like help and input on its development though, and I need help making it easy to port and compile in random environments. If you would like to help, please e-mail me. I call the system the StreamModule system, and it's architecturally related to the idea of UNIX pipes, and the System V STREAMS drivers.