17 July 2010

Testing and Debugging

In this, the second follow-up post to my previous posting about a small project at work, I’m discussing issues that came up in testing and debugging.

First, click back on that older post (here).  That’s okay; I’ll wait.

Okay, done? 

Now, did you catch my problem?  I didn’t until I ran against a specific scenario.  What happens when there’s a blank line?  In the case of my code, I got caught being too clever.  I tried to edit out blank lines.  My problem is that I added a “break;” when blank line was found.  That made my “while” loop think it was done, and then left off anything after the blank line in that file.

Oops.

Then, I discovered something else.  A requirement I had interpreted as “sometimes there will be blank lines” was actually, “Sometimes, in an otherwise populated line, you’ll receive UNIX nulls.”  That’ll break a StreamReader.  If someone knows a way to handle those in a StreamReader, I’d appreciate an email.  However, I don’t know such a way, and so decided the best thing to do was just copy the bytes over.  Then, since I’d be working with bytes, it wouldn’t matter if it was the UNIX byte that said “null,” it’s still a byte and would get copied over.

Looking for a way to do this, I turned to the ultimate in developer help: StackOverflow.com.  Once there, I discovered, once again, that Jon Skeet is awesome.  I’m not posting his code, since I mostly just added Exception Handling, and changed names so they fit my solution better, and basically used his code.

So, how did I discover these problems:  Testing and Debugging.  I know it’s a long way to have gotten to what is supposed to be the point of the post, but I wanted to show how much pain it saved me, since I would have pushed to production based on the fact I knew my code “functioned as designed.”  The problem is that it would have been designed wrong.

Using the VS testing functionality (my client is on VS2005 btw), however, wasn’t enough.  You see, my test passed when I ran it.  However, I believe whole heartedly in having actual output that I can verify.  Again, my code functioned as designed.  It wasn’t until I opened up the test output that I found my problems with those blank lines.

So, three quick things about Testing your code:

1) Always Unit Test your code.  Once I started making changes, I did start failing my Unit Test.  If I’d only checked my output, I might not ever have known there were additional problems.

2) Set up Unit Tests which will test your assumptions as well as the code itself.  I could have used only files with complete data in them to test, and I would never have found my issues at all.

3) Always provide yourself actual, verifiable output, whenever possible.  If I hadn’t done just that, I would never have known about my problem at all.

An Explanation

I posted a couple of days ago about a process we’re writing as a console application.  Since I want this to be the highest possible quality code, I believe I should defend/explain the reasons I did things the way I did. 

After that, there will be a couple of additional posts explaining some issues we encountered and how we corrected them.

So, on with the explanation…

As I see it there are two basic questions (feel free to send others, though).  1: Why a console app instead of a service?  2: Why chained streams?

Why A Console Application?

Because this is specifically requested as a temporary solution, and we were given an outer limit of days (45) that it might be needed, it didn’t make sense to do anything more complex than a Windows Service or Console application.  This already feeds into a BizTalk solution, so we could have gone that route, or we could have created a WCF service that would get called somewhere, but those seemed overly complex for what is, in essence, an automated version of “CTRL+C" “CTRL+V.” 

So, between a Windows Service and a Console app, they both have their advantages and disadvantages. 

A service hooked to a FileSystemWatcher could merge the files at the moment the daily file comes in.  Since the process that uses the merged file runs on a schedule, that could be a good thing, since we’d know that it was always ready to go when needed.  On the other hand, if it fails, we would have to specifically look at the file location to verify that, and if we had to re-run the process we either have to re-drop the file or find some other way to get the service to kick off.

A Console application, on the other hand, has to be scheduled through the Windows Scheduler if it’s going to run as an automated process, and that means balancing the needs of running the merge early enough to be ready for the secondary process, and late enough to be sure we’ve got the file.  On the other hand, if it fails, it’s easy enough to pull it up and run it manually or, even, pull up the solution in TFS and run it in debug mode (to catch what the error is).  And, since it’s scheduled, we can know exactly when to check to verify it ran properly.

In the end, it was decided it was kind of “six of one, half a dozen of the other,” and my Programmer’s Virtue of Laziness said that since I was going to have to write a console app during the debug phase anyway, I may as well just keep it as a console app.

Why Chained Streams?

Really, was there another choice?  I could have converted the streams to byte arrays, but that seemed overly complex for what we were trying to accomplish (turns out that was wrong, but I didn’t know that at the time).  We’re trying to do the equivalent of opening one file, hitting “CTRL+A” then “CTRL+C”, opening a second file, hitting “CTRL+END” then “CTRL+V”, and then saving the merged file to a folder for an automated FTP process to pick up.

Reading the files and writing directly to a new, merged file, seemed to fit that need quite nicely (and would have, too, except for something I found out later).

So, there you go, there was our reasoning.  Fairly simple and straight forward.  If you have other questions, please post them in the comments.  Maybe I’ll do another follow-up based on those.

14 July 2010

A Hack solution Doesn't Mean Hacky Code

Today at work we ran into a little situation with a currently running process. The process is fine, but one of the parts of the business feeding data to our BizTalk orchestration needed a change. For whatever reason, the requirements for a file that gets sent through BizTalk had changed such that the file would be much, much bigger. The business unit responsible for that file said they couldn't supply the whole thing new every night (the current required process) because it would kill their other processes. What they proposed was to send one static file (old data that has to be sent every time) and then they would send the rest of the data (which changes) in the current daily process.

This left us with a bit of a quandary. Since the current Orchestration looks for two files (the one from this unit, and one from another) adding a new file would cause us to have to re-write the Orchestration (which would mean un-deploying it, then re-deploying it). This was a non-starter because this requirement should only be in place for a month or so, and then we'd have to go back to the old process.

So, the decision was made that we would take the static file, append the daily file to it each night, and then send it through the current process (now a single file) as normal.

I was tasked with creating the code that would do that. After a brief discussion with my team mates, it was decided we'd use a Console Application which would be called by the Windows Scheduler. We considered a Windows Service, but opted for the Console App because we can see the possible need to run it manually.

So, even though this is a hack process for a hack requirement, I decided that my code should be the highest possible standard- partially for personal and professional pride, and partially because no "temporary" requirement ever really goes away. So, with error checking removed (since that's a custom thing for my client), here are the basic guts. I, personally, think it's good, but please feel free to fire away with any problems you see...

   1:  public class FileMerge
   2:      {
   3:          public string StaticFilePath { get; set; }
   4:          public string DailyFilePath { get; set; }
   5:          public string MergedFilePath { get; private set; }
   6:          public string FinalFilePath { get; private set; }
   7:   
   8:          public FileMerge() { }
   9:          public FileMerge(string statFile, string dailyFile, string mergedFile, string finalFile)
  10:          {
  11:              StaticFilePath = statFile;
  12:              DailyFilePath = dailyFile;
  13:              MergedFilePath = mergedFile;
  14:              FinalFilePath = finalFile;
  15:          }
  16:   
  17:          public void MergeAndDrop()
  18:          {
  19:              FileInfo mergedFile = Merge();
  20:              if (mergedFile != null && mergedFile.Exists) Drop(mergedFile);
  21:          }
  22:   
  23:          private FileInfo Merge()
  24:          {
  25:              using (StreamWriter sw = new StreamWriter(MergedFilePath))
  26:              {
  27:                  if (File.Exists(MergedFilePath)) File.Delete(MergedFilePath);
  28:                  using(StreamReader staticReader = new StreamReader(StaticFilePath))
  29:                  {
  30:                      while (staticReader.Peek() != -1)
  31:                      {
  32:                          string line = staticReader.ReadLine();
  33:                          if (!string.IsNullOrEmpty(line.Trim())) sw.WriteLine(line);
  34:                      }
  35:                  }
  36:                  using (StreamReader dailyReader = new StreamReader(DailyFilePath))
  37:                  {
  38:                      while (dailyReader.Peek() != -1)
  39:                      {
  40:                          string line = dailyReader.ReadLine();
  41:                          if (!string.IsNullOrEmpty(line.Trim())) sw.WriteLine(line);
  42:                      }
  43:                  }
  44:              }
  45:              return new FileInfo(MergedFilePath);
  46:          }
  47:   
  48:          private void Drop(FileInfo fileToDrop)
  49:          {
  50:              fileToDrop.MoveTo(FinalFilePath);
  51:          }
  52:      }

09 July 2010

File Scrubbing – whether you want to, or not

As an EDI guy, I’m supposed to get to work in industry standard formats.  Since I mostly have done healthcare in the past, this primarily means the ANSI X12 4010A1 standard.  However, in the real world, the standard doesn’t mean much.

There are clients with proprietary formats, vendors with old versions of the standard, and just plain screw-ups that we deal with on a day-to-day basis.

So, with that in mind, I bring you the first in a new occasional series of posts: Public Code Review.  The following code is a scrubber I created to handle known, recurring issues with client and vendor files.  It uses Regular Expressions to find said known issues, and can either just remove them, or replace them.

First, I created two Interfaces: IScubber, and IConfigurable.  IScrubber is the interface which will do most of the work; IConfigurable just allows anyone else who wants to use this code to use their own configuration to set it up.

   1:  interface IScrubber
   2:      {
   3:          string Scrub(string original, string match);
   4:          string Replace(string original, string match, string replacement);
   5:      }

   1:  interface IConfigurable
   2:      {
   3:          void Configure();
   4:      }

Next, comes the class ScrubRule.  This simply holds two strings: the Regex to match the errors, and the replacement string.

   1:  class ScrubRule
   2:      {
   3:          public string Match { get; private set; }
   4:          public string Replacement { get; private set; }
   5:   
   6:          public ScrubRule(string m, string r)
   7:          {
   8:              Match = m;
   9:              Replacement = r;
  10:          }
  11:      }

With our base items created, I can now create the actual scrubber.  In this case, I’ve called it “BasicScrubber.”  It implements both IScrubber and IConfigurable.

   1:  public class BasicScrubber : IConfigurable, IScrubber
   2:      {
   3:          private List<ScrubRule> rules;
   4:   
   5:          public BasicScrubber(string configPath)
   6:          {
   7:              rules = new List<ScrubRule>();
   8:              Configure();
   9:          }
  10:   
  11:          public void Configure()
  12:          {
  13:              foreach (string s in ConfigurationManager.AppSettings.AllKeys)
  14:              {
  15:                  ScrubRule sr = new ScrubRule(s, ConfigurationManager.AppSettings[s].ToString());
  16:                  rules.Add(sr);
  17:              }
  18:          }
  19:   
  20:          public string Scrub(string original, string match)
  21:          {
  22:              Regex rx = new Regex(match);
  23:              return rx.Replace(original, string.Empty);
  24:          }
  25:   
  26:          public string Replace(string original, string match, string replacement)
  27:          {
  28:              Regex rx = new Regex(match);
  29:              return rx.Replace(original, replacement);
  30:          }
  31:      }

As you can see, the scrubber gets it’s configuration (in this case) from the System.Configuration.ConfigurationManager class pulling from app.config:

<?xml version="1.0" encoding="utf-8" ?>
<configuration>
  <appSettings>
    <add key ="RegexHere" value="ReplacementValueHere" />
  </appSettings>
</configuration>

So that if we replace “RegexHere” with the Regex: (?<=\.\d*)0+(?=\D|$)

and if we replace “ReplacementValueHere” with “”

we get a scrubber rule which will trim trailing zeros after a decimal place.

Wire this class up to a windows or console app, point it at your file in error, and let it go.

One of the great things about Regex is its speed at just this kind of process.  Before I started using Regex, I tried using basic string manipulation with string.Replace().  The problem is that when you start playing with special characters, or if something is off just a little bit, string.Replace() is a little unreliable for my tastes.  Additionally, it’s slow.  Running string comparisons and manipulations against a normal X12 835 file used to take a couple of minutes.  With Regex, it’s seconds.  As in, two or three, not thirty or forty.

So, let me know what you think.  This code should be highly portable.  Without much effort, it can be database driven instead of app.config driven, or you can even configure in some custom way.

08 July 2010

How to Cure Insomnia

Read technical documents for 8 hours straight. I've been at the client site for almost a full week now (technically, I guess it's a week's worth of days, but Friday before a three day weekend does not count), and I've been waiting for my system access to come through. Because of that three day weekend thing, it's been a little slower than it would otherwise have been so I haven't had much to do. What I have had to do is read technical specifications. And project plan documents. And more technical specifications. I'm sure it will get much better tomorrow or Monday (I should have all the access I need by then) but for this week, it's been like fighting narcolepsy...

03 July 2010

Changes

This week I left my position at ESI for a new position with Sogeti USA's Irving office. I am now a consultant with Sogeti- my first gig will be providing BizTalk support while the current lead on that project goes on vacation for a month. It should be a good way for me to get some more in depth knowledge of BizTalk with people who know more than I do about its practical capabilities. I'm looking forward to the new adventure. I leave the folks at ESI without regrets, but certainly with fond memories. ESI was my first "real" development position, and I will always be thankful for the opportunity I was given there. The whole team were "good people." And, if you're from Texas, you know there is little better praise than that simple phrase. So, as I begin the next phase of my journey, I say "thank you," to Jeff, Milton, Kim, and the gang at ESI. Thanks for the opportunity. Thanks for letting me learn. Thanks for the memories.