Random Thoughts

Why Session State Should Not Be Stored In A Distributed Cache

30. September 2009 sunny Random Thoughts (2)

Web developers often refer to session state stores and caches interchangeably, while in actuality they serve different purposes.

A cache serves as a caching layer between a web application and an external data source. Caches exist mainly to lighten the load on the external data source, thus improving the performance of the application.

The purpose of a session state store is to store a user’s workspace. Session state stores enable client activity to be persisted consistently across several HTTP requests.

Applications that utilize a cache write to the external data source, typically a database, and read from the cache. This technique can significantly improve the performance of applications that mostly read from the data source.
Caches are designed to be read speedily, simultaneously by many clients and threads. Some cache implementations can link the cached object to the data source, such that if the data source is updated, the cache is invalidated.

Session state stores are not linked to external data sources by design, although it can be consumed by an application to do so. While it may be necessary to store certain parts of a client’s session state to a database, there is usually no need to store all of the session state to a database. In fact, many database-less web applications rely solely on session state to operate.
Session state implementations are designed such that each client has exclusive access to its session data. Even though caches can be consumed by an application to simulate this, exclusivity is not enforced and there is always a chance that a client will be able to access another client’s session data due to either poor design or security flaws.

A distributed cache spreads out an application’s caching layer across many machines, which allow high-traffic web applications to scale out by adding more machines as demand increases. Performance can also be improved by distributing session state data across many machines; therefore, it is worthwhile to examine the requirements and nuances of cached data and session state before deciding on a distributed solution to apply.

Distributed caches, like local caches, are most effective when used to cache data that changes infrequently. They also support simultaneous fast reads of a cached object by many threads. This is where session state sharply differs from cached data.

Session state has little need for speedy multiple-thread access to a single stored resource because a client can only exclusively read or update its session. In addition, the usage pattern of session state is unpredictable. Some applications update session data very frequently while some do not. Session state storage designers normally safely assume that session state is write-heavy.

Caches are configured by default to use either an optimistic concurrency mechanism or no concurrency control at all, to access cached data. This design is driven by the strong requirement to eliminate blocking by any means possible, and works superbly due to the lower proportion of writes to reads.
Session state stores, on the other hand, utilize a pessimistic concurrency mechanism to access stored data. This works effectively because of the exclusive nature of resource access.

The number of concurrent session state accesses to a stored resource can increase if a user opens up several web browser instances of the same application or if the application makes use of numerous AJAX calls. Notwithstanding multiple instances and AJAX-intensive applications, a user’s session cannot have more than a handful of concurrent access attempts.
A pessimistic concurrency mechanism, as used by session state stores, can gracefully handle a few concurrent accesses on a write-heavy resource, and more importantly, provide consistent data to all operations. Inconsistencies in served data can arise, if a cache with no concurrency control is employed to store write-heavy session state. This problem becomes more apparent if the application is AJAX-intensive.

Critical applications that rely on session state require failover and redundancy support. These features are usually built into commercial session state storage solutions.
Caches have no need for failover or redundancy because caches are simply a caching layer: if the requested data cannot be retrieved from the cache, it can always be fetched from the primary source. Therefore, most distributed cache implementations do not support failover or redundancy; issues solution architects seldom remember when moving session storage to a distributed cache.

The conundrum of where to store session state arises when an application needs to scale to accommodate more users.
While there are a few commercial distributed session state storage solutions, there are no free robust alternatives, and the usual consensus is to store session state in freely available distributed cache solutions, or eliminate session state entirely from the application.

Moreover, even when session state is manageably stored in a distributed cache, most often, the same servers that are caching infrequently changing data are used to store session state. Sharing the cache this way leads to performance degradation.
This occurs because whenever the cache server needs to store a new cached object or remove an expired one, it has to momentarily suspend all read operations internally on all other cached objects until the object is added or removed. The overall outcome is sub-optimal reads for cached infrequently changing data.

Developers and architects should carefully weigh the aforementioned issues before moving locally stored session state to a distributed storage and should, whenever possible, opt for a solution that was specifically built for distributed session state storage.

When Documentation Does Not Match Implementation

19. September 2009 sunny Random Thoughts (1)

Update: Microsoft has addressed these concerns. The state service documentation is now quite accurate.

Not too long ago, I needed to piece out the communication protocol between the ASP.NET state server and the web server, because I was working on a peer to peer version of the ASP.NET state server.

I made some effort to obtain the protocol as described in this post. Along the way, I found the protocol documentation by Microsoft, and was delighted. I could now design my implementation of the state server based on this information.

While going through the documentation, I noticed that the format of some of the messages did not match what I had earlier observed, so I decided to verify the protocol to be extra sure – Boy, was I in for a surprise.

First, there are bold, wrong statements in the documentation:

From section 3.1.5.3: “Because a client sends a lock-cookie value along with the session state data, the state server MUST store the lock-cookie value. Internally, the state server MUST also store the date and time when the state server received the lock-cookie value. This information is necessary if the state server ever has to send response-locked messages, as specified in sections 2.2.5.2 and 2.2.5.4.”

This is a wrong, misleading statement. A LockCookie value is used by a Set Request message to unlock a locked session entry (if it is locked) before storing the new data.
The state server has no use for storing the client LockCookie value, in fact, the state service MUST never store any LockCookie value sent by the client or in any way let the client influence LockCookie values as they are exclusively generated by the server.

From section 3.1.5.5: “A client can acquire an exclusive lock on session state by using either a successful GetExclusive_Request or Set_Request message.”

This is another off the mark statement. A client can only acquire an exclusive lock with the GetExclusive Request.
Even a cursory look at the SessionStateStoreProviderBase class is sufficient to confirm that this statement is wrong.

Then, there is important information that is left unstated:

The document does not mention anywhere that the ActionFlags header actually indicates that the server should only store the presented data if the unique session id does not already exist. If the ActionFlags header value is set to 1, and the server already has the presented session id, the existing session data will not be updated with the new one, however the state server will still reply with an OK response (as if it stored the data). This behavior is not easily noticeable to the casual observer, but is important to implement a 100% compatible state server.

There are other inaccuracies and misleading statements in the documentation that makes it virtually impossible for anyone to develop a state server implementation using the Microsoft documentation.
I had to painstakingly piece out the protocol from scratch. I’ve published the correct version of the protocol in PDF format and HTML format for reference purposes.

What's more, if you take a look at the history of the Microsoft document, you’ll notice that it has been edited more than twenty times over the course of almost three years. You’d think that after twenty edits, it will be somewhat accurate, but after almost three years of editing the documentation for a major ASP.NET server, Microsoft still manages to get it wrong. It’s safe to say that either Microsoft is doing this intentionally or they do not know how their own technology works.

What’s even more disturbing is that this documentation is for a protocol, not for a piece of software. Do protocols change every other month? Imagine the chaos that would ensue if every developer had to second guess each statement in RFC-2821 when writing an email client and then also make sure that the protocol hasn’t changed every other month.

It seems the only reason Microsoft publishes these specs at all is to pay lip service to the European Union because there is no point publishing specs that are innacurate and can't save developers’ time when implementing a technology.

I Want a Native C# Compiler

26. January 2009 sunny C#, Random Thoughts (11)

Wouldn’t it be nice if C# code could be compiled directly to machine code? Having such a compiler would position C# as a serious system programming language.
Developers would be able to write system software for routers, for instance, in C#.

I don’t see any reason why such a compiler should not exist. In fact, the creation of a native C# compiler will be well justified.

C# is a well designed programming language. It would be a shame if the language is stuck forever with the .NET/Mono frameworks, especially if you consider that there is no reason why the language has to be inextricable tied to these frameworks.
There is no requirement that C# code must compile to IL and there are no language-level assumptions that the compiled code has to be machine-independent.
High-level features routinely used by developers such as threading and reflection, are .NET library calls and have no connection to the language itself. The only high-level feature that the language implies is a garbage collection system. In fact there are language-level hints that C# can be a system-level programming language -- how often do you use the volatile, stackalloc and fixed keywords?

The C# developer base is huge, so a native C# compiler will push the language even further to new platforms and projects that are currently unsuitable for development with C#. It will enable developers to write ALL their code; high-level and low-level in C#. Higher-level code will be compiled to IL, whereas lower-level code will be compiled to machine code.

A native C# compiler would be great for coding libraries, for instance, a proprietary encryption or compression algorithm. With a native C# compiler, an algorithm can be coded in C# and compiled as a native code library. This library can be linked to and used in systems without any .NET (or alternative) frameworks.
The best part is that if this library is needed for a .NET/Mono project -- all that is needed is recompilation and the algorithm will scale to managed code without having to port the library or use unmanaged calls. It will work great in both managed and unmanaged worlds.

The language is an open standard, so anyone with the time, expertise and resources can create a native C# compiler. Also, to be successful, this compiler only needs to support C# version 2.0.
Language features introduced in versions 3.0 and 4.0 are not that important and can be considered “Microsoft extensions” to the C# language. Indeed, Anders Hejlsberg admitted that features added in C# 2.0 were features they didn’t have time or didn’t know how to properly implement in C# 1.0.

If you are still not convinced about the viability of such a compiler, take a look at the compilers available for the D language. They are living proof that such a compiler is feasible and will compile a modern language directly to machine code, complete with a (small) runtime, memory management and type system.

Sunny Ahuwanya's Blog

Mostly notes on .NET and C#

Why Session State Should Not Be Stored In A Distributed Cache

When Documentation Does Not Match Implementation

I Want a Native C# Compiler