12 April 2013

The Lowly Do/While

So, semi-officially, I'm exploring some options regarding a Pub/Sub Architecture at work.

One of the requirements (for this to be worth continuing exploring instead of just grabbing something off-the-shelf) is that we want to remove the man-in-the-middle that so many other frameworks use.

Before I go on, let me explain that:
Most pub/sub architectures, buses, or solutions are actually broker solutions.  Whether we're talking BizTalk (which does, indeed, use pub/sub internally), MassTransit (which looks like an awesome product, though I've never used it), or whatever, there is something in the middle that is maintaining the subscriptions.  A subscriber does not request messages, messages are pushed to the subscriber.

We want to explore the possibility of getting away from that paradigm.  We want the publisher simply to be able to lay their message wherever, and the subscribers to say "Hey, I wonder if I have mail?" and go check.

With that out of the way, what I'm currently exploring is the ability to lay messages on a DB (I'm using RavenDB for development because a) it's easy and b) It's what I'd like to use), and then have the subscribers be responsible for picking up those messages.

That's an interesting problem right there.  This is especially true with Raven's limit of 128 documents per request, but even with SQL or Oracle, you run into issues of "how often do I poll, and how do I know when to stop?"  Pulling back "all" the messages is no big deal when you have, say, 1000 or less.  It becomes a much bigger deal once you start approaching the 100,000+ range.

I started with the standard "while" loop.  You know; you've used it more times than you can count.  And you're already thinking (many of you): "Sure.  You'll go fetch records, check to see if you need more, and then use a while loop to finish fetching your documents."

You're even thinking of something along these lines:

int count = 0;
int cursor = 0;
List<Document> documents = new List<Document>
using(var session = DocStore.OpenSession())
{
  var results = session.Query<Document>();
  if(results.Count() > 0) documents.AddRange(results);
  count = results.Count();
  while(count > 0)
  {
    cursor = documents.Count;
    results = session.Query<Document>()
              .Skip(cursor)
          .Take(128);
    count = results.Count();
    if(count == 0) break;
    documents.AddRange(results);
  }
}

Many of you are looking at that and (except for how the paging is being handled -hey, it's an example, not production code-) see nothing particularly wrong.

And that's were I started.  But it looked messy to me.  It was hard to say "Here's what I'm doing."  So, I considered, and then I remembered something from long ago.  It was a construct that I hadn't used since my computer classes.  In fact, if you look around the Net, you'll see some people calling it the worst of the loop constructs.

I'm referring to the lowly Do/While loop.

Where a For loop says "Hey, do this X times," and a While loop says "Hey, as long as this is true, keep doing this," Do/While says "Here, do this.  Then if this condition is true, keep doing it."  That is, the action inside the loop will always run at least once.

When fetching records from a database that may or may not exist, it is incredibly helpful.

Here's the cleaner (IMO) code:

int count = 0;
int cursor = 0;
int page_size = 100; //I like 100 docs at a time.  Use as many as you want up to 128
List<Document> documents = new List<Document>();
using(var session = DocStore.OpenSession())
{
  do
  {
    var results = session.Query<Document>()
          .Skip(cursor)
          .Take(page_size);
    count = results.Count();
    if(count == 0) break;
    documents.AddRange(results);
  }
  while(count > 0);
}

I added that "page_size" variable, just for clarity.  It's better to state explicitly "We'll be returning up to 100 records at a time" than just leave it up to the RavenDB configuration.  Easier for someone later to read and know what's happening.

See how much easier it flows?  The original, if read for understanding would say:
"First, we go ahead and initialize some variables, including a list to store our results and two ints... I guess I'll see what those do.  Okay, then we go fetch the records from the database (Oh, yeah, this is RavenDB, so it's probably 128).  Then we check to see how many results we got back.  If we have any, we'll add those to the list, and then go look for more.  Oh, that's what we're using those two ints for, one is how many records we got back this time, the other is how many records we have total, so we know how many to skip.  Got it.  We'll keep doing that until a fetch doesn't grab any."

The second, reads this way:
"Okay, we initialize some variables: a list to store our results, and three ints... I guess I'll see what those do.  Then, we go fetch results- oh, there's the first int: cursor must be the total records we have grabbed and page size means how many records I'm bringing back.  Then count is how many records were brought back.  If there weren't any, we stop.  Otherwise we add the current result set to our list, and keep going."

Both things do functionally the same operation, the only difference is in readability.

Over on StackOverflow, the question about the difference between Do/While and While has been asked a couple of times, and the consensus seems to be "I've never needed it" or "I've seldom needed it."  But how often do we do exactly this functionality- "Hey, do this until it's done." 

If you need to make sure it's done at least once, then Do/While is your man.

No comments:

Post a Comment