Sunday, November 14, 2010

Programming : Cached Interface Requests

Problem
When writing interfaces for tools I commonly need to display a POD (Piece of Data) that is complex and needs a decent bit of preprocessing time.  
One example is a pull down constraint list containing every string in a game.  This could be thousands of data points which must be sorted for quick browsing.  In addition, users invariably expect a slower custom numeric sorting (where text10 comes after text9).
 
(Rule: Once you show users something is possible, they come to expect it.)
I should point out my language of choice for tools is C#.  This may differ elsewhere, but it lets my company prototype and develop internal applications quickly.  Sadly, Microsoft's GUI tools don't seem to have been updated in the past ten years, forcing you to purchase or write your own, but I'll save my whining on that topic.
We don't want to presort every possible query but we can't have the interface take seconds or minutes to draw every time.  How to solve this?

Ignorance
Ignore the problem and wait for it to occur.  It's easy to make elaborate systems to solve problems that don't (yet) exist.  This is actually a good idea if you don't anticipate users will have large enough data sets, although you should be aware of the potential issue.
You may need to take into account the availability of people to update the software, however.  We have long periods of time where our programmers are shuffled onto other things, making it unlikely tools will be updated in a timely manner.
The idea of ripping away queries and making it generalized didn't come until very late in the process and I feel silly for not realizing it then.  Ignorance is bliss, after all, until it bites you.


Quick and Dirty
Make enough assumptions about the query and it's pretty easy to deal with.  Your data container (presumably a list of elements) could contain a PresortedList.  This would be null and populated when the first request came in.  This gets complicated if you can have queries with parameters as you have to store the parameters, toss out old results, etc.  Not very forward thinking but that was my first solution.

Memoization
Create a struct for your search query.  Here's a sample:
QueryStruct { sorted : b; numericsort : b; regexPattern : string; ... }
Structs (or tuples) are a great way of making things more maintainable.   Functions can pass structs instead of long parameter sets.  Classes can later be wrapped around those functions and own the struct.  And so on.
My query cache is a List with records that look like:
QueryCacheRecord { QueryStruct query; List<T> dataList; }
I include the template for ambiguity of your data type, but this is obviously pseudocode.  The idea is to scan through the cache and find a matching QueryStruct.  If one exists, return the dataList.  If one is not found, perform the Query and add it to the cache.
Externally, you have the interface:
List<T> DoQuery( QueryStruct query );
DoQuery would automatically perform queries as necessary and return caches when possible.
Memory is cheap, but not free, so when your cache grows to a predetermined size, old queries will need to be removed.  The easiest way would be to sort the queries as you go along, moving hits on cached queries to the bottom of the list.  Then, you remove the oldest ones first.  A more intelligent system would weight this based on query size or other metrics.
This approach makes the caching invisible to the calling code.  You could further generalize the caching implementation into a reusable component.

Sideline: Lazy Data
If you have a set of pre-defined queries you could construct a class like:
LazyClass {
    LazyClass(QueryStruct query) {
    T mData;
    QueryStruct mQuery;
    public T get( return mData == NULL ? generate(query) | mData );
}
LazyClass wraps a data object T that potentially exists.  It is only constructed when you access it through get() and then cached for the future.
Note: The generate function could be implemented as a delegate (or similar) and QueryStruct could be a second templated value, making it a completely generic lazy pattern.

No comments: