Saturday, March 17, 2012

Sizing of RESTful Paged Collection Resources

There's been a debate lately in my IT department about how to implement paged collections. To provide some context, we've agreed to use the REST architectural style for all of our web services. We more than a handful of teams building such services, and almost every one of them at one point or another has to provide resources for some kind of collection. Specifically the question is whether it's appropriate to offer the client control over the page size via a parameter such as "limit" in the resources URI. I feel rather strongly that this is bad design, for a variety of reasons to follow.

First, let's review why we want to page our collections and what we are trying to achieve via paging. Foremost, we absolutely don't allow unbounded size representations because we view them as a potential denial of service vulnerability for both clients and servers. Both clients and servers have finite resources, such as memory. Our persistence stores hold potentially big data sets that very easily could overwhelm our clients or servers, especially when we marshall and unmarshall a big representation. This consumes at least O(N) resources for N records, and is often worse. If we don't put some sanity on the server side, a naughty or unwitting client might ask for a large data set. The server might bog down trying to construct the response. If the client sees its response time exceeded, you can rest assured that our friendly user will respond to the timeout error message by resubmitting the request. Iterate until your sysadmin gets the 3AM phone call. Or maybe the server manages to construct the monster data set and so the client gets what he asked for. After 7 minutes of parsing a bazillion records it crashes and again the dreaded 3AM phone call happens to a different sysadmin.

As an architect, I have a solution. I butt my nosy nose in and say "I don't care what clients say they need or what servers feel like implementing, THOU SHALT have an O(1) bound on representation sizes via as a maximum number of records in a page". The developers hate me and the sleepless sysadmins love me and take me out drinking until 3AM and get no sleep anyway, but at least it was their choice. Oh, and the developers showed up too and now they like me again. Hooray, beer.

Fast forward a few months. Now all the teams accept that there will be page sizes. What do developers do when they are asked to pick a parameter like page size? One of two things: either they configure it with their favorite splendid, bitchin' config file DSL language **OR** they figure somebody else should pick so they invent ways to have their clients pick by taking in the page size from the URI. Some are clever and do both: they set the default in the config file and they put the parameter in the URI so that it's optional. QA still tests it. They don't have enought to do, so who cares. Inevitably the devs will have meeting notes where somebody unnamed said "sure, I guess we could supply the page size" in response to a carefully crafted leading question that translates to "is there any scenario you can possibly imagine where you would want to control the value that I might arbitrarily select and shove down your through if you say no?" And of course, there is an undertone of "what kind of developer are you, passing on a chance to set a value that SOMEONE has to set?".

So now you have the context of the discussion. We all figured out pretty quickly that the fancy pants link relations "next", "prev", "first", "last" were our friends. For a while we argued endlessly over how to structure the URIs to name which page. Some people use offset, some use a marked record, others use page number. Then one day we stopped arguing because IT DOESN'T MATTER. We don't need a standard other than we follow rel="next" to get to the NEXT page. And as it turns out, my friend Mark Nottingham has in fact formally standardized this in RFC 5005 and RFC 5988. We did still find a way to argue about whether next should take you to younger or older records. "Next should not mean earlier!" Then we read Mark's stuff more carefully and realized that again either way is fine because it's the next RESOURCE in a server defined ordering of resources. So we had to find something else to argue about.

So we landed on limit. Yes there would be a maximum on limit, but could a client ask for a smaller page? That is a pressing and important question. I have an answer: NO. Damn it, No. I would forbid all services from EVER having a limit parameter in any URI and I would force them to actually decide a FIXED page size for each collection type. I'd even build it right in to the media type definition and tell the services they don't even own specifying it. In the spirit of good will, I'm willing to compromise on this last point. Maybe I'll let them configure it in their config file, instead of documenting it in the media type, but under no circumstances should they offer their clients control of the page size. Damn it.

Why? Am I a crusty architect bent on making proclamations that confound developers and make them question the color of my soul!? Well, maybe, but that's not the point today. There are four good reasons:
1) YAGNI
2) Fixed page sizes support Better HTTP caching
3) Homogenous page sizes simplify response time SLAs and rate limiting
4) RESTful APIs should eschew RPC in favor of HATEOAS

Let's go through these in turn:

1) YAGNI - Clients don't need to configure it. You don't need to let them configure it. Get rid of all that code solving non-existent problems for nobody. Your clients will be just fine and happy. You don't have to add a stupid method to configure it and they don't have to add a stupid parameter to their calls to give you the stupid value for the stupid parameter. Somebody always says "what about this hint of a shadow a client that has to code for a big list in browsers and a small list in a mobile device? SHOULDN'T we help them by offering them a variable they can set to size our pages to their display needs?" Ummm, let me think.... NO. Keep your UI concerns out of my web service API. Your visible record set is not relevant to my page size. Go to dzone.com and look at how you should do it. Fill up your screen from your buffer and call "next" lazily to add to the buffer when the user gets is close to diplaying the last record. That way there is no bottom, scroll forever. Sweet.

2) Better HTTP caching - If your collection is a nice append only kind of thing, guess what? People start from where they left off and scroll forward (which as we established earlier is a synonym for "in some consistent direction"). This means fixed page sizes take every client to the same set of pages. Cache hit rate, baby. Love it. Stick it in varnish or squid and yawn when 500 users poll you every second. Boring. Play quake with the extra server juice while you compile the latest linux kernel while you do some video editing. Given them configurable page sizes just imagine the spindles of your storage physically spinning as the jackass wants page size of 37 traverses the same old data in new and exciting ways. Varnish? FAST. Spindles grinding for pagesize 37? SLOW.

3) SLAs and Rate Limiting. If every box of chocolates has 25 chocolates, then it takes about the same time for Forrest Gump to eat one box of chocolates as another. So you can say "Forrest, eat that box of chocolates in 40 seconds". Similarly, you can say "Forrest, don't eat more than 15 boxes in an hour". If you don't assure that all boxes have the same number of chocolates, Forrest will find the ones that have 2 chocolates each and complain that you cut him off needlessly. And he'll be right. Homogenous page sizes mean that the resource consumption and response time of one "hit" does not depend on anything. To make reasonable statements about your capabilities you don't have to do statistics to factor out confounding variation. You just count and say things like "all pages come back in 500 ms". Well, almost, somebody will fire up JMeter and put 1000 threads of whoop ass on your server and go "hmm, you can't scale that for ever". But then you go "hmmm, I thought we said our servers scale to handle 500 concurrent connections, why did you do 1000?" You get your O(1) bound in and you win.

4) HATEOAS > RPC. Ahem. Repeat after me: Dr. Fielding says all application state transitions must be driven by client selection of server-provided choices in hypertext. Dr. Fielding says that the descriptive effort in defining a REST API should be spent on media types and link relations and not on how clients can treat your URIs like some kind of remote procedure call. Really. He did say soIf Roy said it, you must obey. Ok, appeal to authority is a fallacy. But, in fact, there are good reasonsWhy HATEOAS is the best that aren't traceable to the cult of Roy.


We didn't switch to REST from SOAP because we like four letter acronyms. The irony of ironies is that your fancy pants parameter, designed to give the client control sits there, laughing at you, because if you do REST right, you, Mr. Web Service Developer STILL end up making the choice of what value to put in it. The client goes to some resource where he can see a link to the collection. He told you nothing about page size to get there. You HAVE to populate the link with no input from him. What value do YOU put in it? Play the evil architect dirge: Bwa-ha-hahahaha. Now you realize that you've been insane for all these years. All your efforts to give others control failed. Run for Congress, those guys suck at what you do. 


Here though, clients don't want to understand what your stupid parameters mean. They don't want to read the documentation to figure out how to handle the error if they specify a number larger than you accept (you do return an error, right?). They want to follow links like "next", and ponder deep questions like "does that go forward or backward in time". Or maybe they do want to call our beautiful REST API like its a COBOL procedure. Some clients devs just don't get it, even in 2012, and they want to couple to everything bad and be derisive to "theory" and stuff that interfere with their agile ability to create brittle code laced with tribal knowledge. What should we do with these guys? We should give them no levers to lift, no knobs to turn. RPC is so SOAP old school lameo. Send them back to it. Or better yet make them write access objects and transfer objects for our NoSQL persistence stores. Haha.


To conclude, set the page size uniformly in either the media type definition or gasp, in your service's config file. Having trouble picking the actual value!? Go with 15. No, 100. No 25. Whatever. Nobody actually cares what it is anyway.