Archive
Considering a new Code Camp talk
So I’m considering getting back into the ring at code camps. For the past several years I’ve held out of them mostly due to more qualified people handling a lot of the topics, and wonderful personal things like my kids and family taking up quite a bit of time (seriously!). But lately I’ve felt the urge to get back into the swing of it.
One of the things I think is underrepresented, and will likely become the basis of my next talk, is Website Operations, and how you can support that concept via architecture and developer philosophy. This would cover things like email delivery and storage, cache management, web farm management, logging and debugging, change management and tracking, GA versus local web analytics, and so forth. Wondering if people would consider that interesting or boring?
Be careful interacting with any objects retrieved from cache
Good evening all,
Just wanted to offer some background on a recent problem we’ve run into in several different web applications. When it occurs it can be difficult to diagnose and difficult to track down once it is diagnosed, but generally you can end up with a fairly easy fix. The problem revolves around storing List- or IEnumerable-based objects in in-memory cache in C#.
Our web applications generally have a robust caching mechanism built in to minimize reads to databases as well as reads to disk. You can store file contents or information about users in cache under the expectation that those won’t change very often. Many times things like lists of countries, or states, or property attributes would end up in cache as well; when is the last time the list of U.S. states changed? So instead of going to the database to get states when you need them, they become an obvious candidate for storing a List in some sort of cache.
The problem, then, is two-fold. Assuming that your List is in memory, your caching mechanism is likely to hand back the list when is asked for out of cache. Typically that’s mistake number one, because you don’t want to hand back the list itself, you want to hand back a new copy of the list. Now I realize that sounds like wasteful processing, to create a new copy every time something is requested from cache, but here’s a scenario and what can happen.
Let’s say that you have two web applications, one a standard desktop web site, and the other a mobile web site. Both of the them use a List to populate search forms. You decide that you’d like to sort the state list on the mobile site based on distance from the mobile phone’s location, which is different behavior than the desktop site.
Within minutes of starting up your desktop site, you notice that the state list in your search form appears to be in a random order.
What just happened?
Well, the mobile version of your website needed a list of states, sorted to meet its needs. Something like this:
List<State> states = CacheManager.Get<List<State>>(stateCacheCategory, stateCacheKey);
states.Sort((c, d) =>
{
return GetDistance(c.Latitude, c.Longitude, localLat, localLong)
.CompareTo(GetDistance(d.Latitude, d.Longitude, localLat, localLong));
});
If you can imagine, with a few supporting functions, this would get the states from cache and then sort them by distance from the phone’s latitude and longitude.
Seems harmless enough, right?
Except for when you haven’t created a new object from the list in cache. If you don’t do that, you hand back a reference to the cached item. This sorting function would execute against the object directly in memory, and replace it with a new, sorted version.
And suddenly the cached copy feeding your desktop website is handing back a list sorted by some random mobile visitor. This has happened to us a couple of times between our mobile and desktop web applications.
The way to resolve this is to create a copy of the item coming back from cache, so that you are not operating directly on the in-memory object:
List<State> states = CacheManager.Get<List<State>>(stateCacheCategory, stateCacheKey).ToList();
states.Sort((c, d) =>
{
return GetDistance(c.Latitude, c.Longitude, localLat, localLong)
.CompareTo(GetDistance(d.Latitude, d.Longitude, localLat, localLong));
});
The second scenario revolves around updating or adding items to the cache, and the appropriate locking semantics that must take place in a multi-threaded web world. If you don’t properly lock your cached objects, you run the risk of threadlocking during an update or add, sending your web application into a spiraling dance of death as threads lock and new ones are created simply to have those lock as well. This recently happened to us where a web server would suddenly create hundreds of threads and eventually crash, or create no new threads, but have the request count on the web server go through the roof waiting for threads to unlock.
Let’s take this code here. Let’s assume we’re storing in memory items in a dictionary of dictionaries.
public void AddToCache(string category, string key, object item)
{
if (cacheDictionary == null)
cacheDictionary = new Dictionary<string, Dictionary<string, object>>();
cacheDictionary[category].Add(key, item);
}
Now we can assume there are also methods to get something from cache, and to delete something from cache. Under heavy load, we could assume that if a large item is being added to the cache dictionary, that other requests might also try to add this item, or try to get this item while the add is occurring. This would result in thread collisions and contention as requests tried to read from an object locked for writing without the appropriate locking semantics to held the threads know when to wait.
There are two ways to get around this. The first is to wrap your code with a generic object lock like this.
object _addLock = new object();
public void AddToCache(string category, string key, object item)
{
lock(_addLock)
{
if (cacheDictionary == null)
cacheDictionary = new Dictionary<string, Dictionary<string, object>>();
cacheDictionary[category].Add(key, item);
}
}
The second would be use an actual ReaderWriterLock. These types of locks have an advantage over the locking semantic above. The locking semantic above blocks all threads when the lock object is locked. A ReaderWriterLock allows data to be read by multiple threads at the same time, only blocking when a write is going to occur. So that would look like this:
private ReaderWriterLock _readerWriterLock = new ReaderWriterLock();
public void AddToCache(string category, string key, object item)
{
_readerWriterLock.AcquireWriterLock(Timeout.Infinite);
if (cacheDictionary == null)
cacheDictionary = new Dictionary<string, Dictionary<string, object>>();
cacheDictionary[category].Add(key, item);
_readerWriterLock.ReleaseWriterLock();
}
public object GetFromCache(string category, string key)
{
_readerWriterLock.AcquireReaderLock(Timeout.Infinite);
if (cacheDictionary == null)
cacheDictionary = new Dictionary<string, Dictionary<string, object>>();
var item = cacheDictionary[category][key];
_readerWriterLock.ReleaseReaderLock();
return item;
}
This would allow for multiple threads to read the data while the add would lock for writing, causing the read threads to appropriately wait for the write to end. Without these locking semantics in place, you can put yourself in a poor position if your server ends up deadlocking on shared cached resources.
Support Architecture for your Web Application
Whenever I start a new project, in particular a web project of some kind, there are several steps that I take in preparation for scaling the project or supporting different business functions. Let’s face it, not everything that the website might expect to do should be done by the website; a classic example of this might be notification that a subscription is about to expire. That sort of occasionally scheduled business process is certainly doable in the context of web development, but in general it’s not recommended.
One that might be less apparent might be tasks such as general email delivery, say a forgot password email, where under high traffic one might want to off-load the email delivery to a secondary process rather than hold up rendering of a web page to the browser while the email is built, formatted, and ultimately delivered.
To support a wide variety of potential offline processes, I generally will setup what I might refer to as “harnesses” for three different types of processing. These harnesses are fairly generic, interacting with an interface implementation and typically based of the standard .NET configuration model for instantiation of the class implementing the interface. In many cases, the interface implementation is common to all three harnesses, such that the processes are interchangeable.
There are three basic timing elements for things you might want to accomplish offline from your website. The first timing element I would refer to as “do something repeatedly every so often”. The second timing element I would refer to as “do something on a scheduled basis”, whether that schedule is every few hours or once a day or even once a month. The last timing element I generally prepare for is a “one off” or “one time” execution.
It should be noted that this sort of architecture presupposes that you have full control of your computing environment either through ownership of the servers or access via some sort of cloud computing or virtual hosting service, such that you can install and run items from the console or command line. Obviously this would not be possible if all you had was a web host for your website.
Under those assumptions, I will create three things. First, I will create a Windows Service. This service’s sole purpose would be to take a configured set of objects that implement my interface and run them repeatedly at a specified interval, say once every three minutes. A good example of this might be a process that monitors an email inbox for new messages and processes them in some fashion. Because this is a Windows Service, it might be wise to give each object its own thread, or if you are the latest .NET platform, its own Task.
Second, I will create a standalone console application that I intend to schedule to run regularly. This console application will also load up a configured set of interfaced objects and run them a single time when the scheduled task executes. A good example of this might be some sort of nightly statistical analysis that needs to be done for reporting. In the same sense as the Windows Service, if you have a lot of objects, it might be wise to allow for sequencing some of them in order while noting which ones are truly independent, and then multi-threading them or assigning them Tasks in the proper order.
Last, I will create a near-replica of the standalone console application above, but likely without the multi-threading in place as this is considered to be a one time execution. The application above and this one might even go so far as to be exact copies deployed separately, with one scheduled and one not, if the different requirements for each don’t stray. Common uses of this would be for one-time data conversions, say for example you had inherited a poor database structure that had a person’s name all in one field and you wanted to split into first and last name.
Once I have these three harnesses built, it becomes very easy to generate a plug-and-play approach to any tasks that would need to be scheduled or executed in a predictable fashion without constantly creating new services or new executables to handle the work. This would then allow me to very easily move tasks that my web project might eventually find overwhelming or detrimental to performance off into background tasks without constantly reinventing the wheel. It also allows me, if I so choose, to install or write some code to monitor the execution of these harnesses without having to rewrite the monitoring code every time as well.
This approach has been helpful on several projects and saved me a lot of work down the line.