April 17, 2009

Using a Factory to Generate Multiple Instances of a Dependency

Passing dependencies into a class or method leads to a cleaner, more testable design, but what if you need to create multiple instances of the same dependency? Consider the following, where the ImportHeadersMatchTemplateHeaders method needs to create two instances of the CsvReader class:

public class ImportFileValidator 
{
private readonly string _importFileName;
private readonly string _templateFileName;

public ImportFileValidator(string importFileName, string templateFileName)
{
_importFileName = importFileName;
_templateFileName = templateFileName;
}

public bool ImportHeadersMatchTemplateHeaders()
{
string[] importHeaders;
string[] templateHeaders;

using (IReader reader = new CsvReader(_importFileName))
{
importHeaders = reader.GetHeaders();
}

using (IReader reader = new CsvReader(_templateFileName))
{
templateHeaders = reader.GetHeaders();
}

return (importHeaders.Length == templateHeaders.Length)
}

// other validation methods
}

We could modify the constructor (or the ImportHeadersMatchTemplateHeaders method) to include two IReader parameters, but let’s say that in this case the method will always use the same type of IReader object both times.  To handle this we can have the ImportFileValidator class’s constructor also receive an object that creates the appropriate IReader objects:

public interface IReaderCreator 
{
IReader CreateReader(string fileName);
}

public class CsvReaderCreator : IReaderCreator
{
public IReader CreateReader(string fileName)
{
return new CsvReader(string fileName);
}
}

public class ImportFileValidator
{
private readonly string _importFileName;
private readonly string _templateFileName;
private readonly IReaderCreator _readerCreator;

public ImportFileValidator(string importFileName, string templateFileName, IReaderCreator readerCreator)
{
_importFileName = importFileName;
_templateFileName = templateFileName;
_readerCreator = readerCreator;
}

public bool ImportHeadersMatchTemplateHeaders()
{
string[] importHeaders;
string[] templateHeaders;

using (IReader reader = _readerCreator.CreateReader(_importFileName))
{
importHeaders = reader.GetHeaders();
}

using (IReader reader = _readerCreator.CreateReader(_templateFileName))
{
templateHeaders = reader.GetHeaders();
}

return (importHeaders.Length == templateHeaders.Length)
}

// other validation methods
}

This may seem like extra work and unnecessary complexity, but it’s really not.  It allows you to mock a class’s dependencies, making it easier and faster to test – and hearing the whoosh of your unit tests finishing lickity-split definitely makes it a satisfying choice.

April 7, 2009

Thoughts On Unit Tests

A few things I’d tell someone learning about unit testing:

  • A unit test should focus on one small, very specific piece of functionality.
  • Unit tests serve as documentation.  You should be able to review a piece of code’s unit tests in order to learn its behavior.
  • It’s okay to repeat some code in unit tests if it helps clarify the tests and make them more expressive.
  • Unit test names should be descriptive and clearly convey what the test is trying to verify.
  • Unit tests should be very fast.  Ideally, each one runs in 0.1 seconds or faster.
  • Unit tests should be repeatable.  Remove all environmental dependencies and any other factors that might lead to inconsistent test results.
  • If you find it hard to write a unit test (such as it’s too hard to break a dependency or you have to interact with a database), then there’s something wrong with your code.  Refactor your code in order to make it easier to test.
  • Don’t worry about 100% test coverage.  Strive for high coverage, but don’t worry about testing every minor, trivial case. 

Unit Testing != Test-Driven Development

Unit testing and test-driven development (TDD) are not equivalent.  With unit testing, you write tests that verify your code is functioning as expected.  This is often done after you have written the code itself.  With test-driven development, on the other hand, you write the tests before you’ve written any production code and use the tests to guide the design of your code.  Thus TDD is not just a way of testing; more importantly, it’s a design activity.

Just like other design activities, TDD is done before production code is written.  You first create a test (writing just enough production code to get the test to compile), run the test, and verify that it fails.  You next develop a solution and write production code that gets the failing test to pass.  Once the test is passing, you refactor and clean the code.  You then start the process over by creating another new test.  This cycle is known as “red-green-refactor”.

Writing your tests first forces you to think about requirements and how a user would interact with your API and classes before you even write any code.  You’re also focusing on writing small pieces of code that can be tested independently of other code, which leads to smaller classes that display high cohesion and low coupling.  Plus you’re incrementally adding tests and features, thereby providing you faster feedback and allowing you to learn more about the code than you would by writing all of the code first and then all of the unit tests afterwards. 

So while unit testing is a good practice, TDD is an even better one.

April 6, 2009

Motivating Others to Clean the Campground

I wrote about my goal of incrementally cleaning up the codebase I work on and leaving it in a better condition than I found it.   My old boss commented on the post, asking how others can be motivated to do the same. 

Well for one, get to the new programmers early.  Have the experienced programmers show them good habits for writing and maintaining code, and hopefully they will adopt at least some of these habits as they learn from the experienced ones. 

But what about motivating experienced developers?  This isn’t as easy.  Having management enforce this practice will probably be ineffective (plus you might have a hard time convincing management of the importance of cleaning up code).   Instead you need to develop the team so that they display the following two qualities, which I think lie at the heart of wanting to continually improve code:

  1. The developers must take pride in their work; and
  2. They must be held responsible for the entire codebase

At the end of the day, I want to feel good about the work I did and have it serve as an example to others.  Furthermore, I really like the company I work for, and I want to do everything I can  to ensure that we make quality products that satisfy our customers.  Thus I’m happy to make the extra effort to improve our codebase.   When you take pride in what you do and the code you write, you want to clean it up and continually make it better.

Programmers will also feel motivated to improve the codebase if they are responsible for its entirety.   You can’t have certain people own just certain features.  Every programmer has to feel accountable for all of the code.  They can’t feel like they can work on one part for a little bit and then forget about it and move on to something else, or ignore quality concerns because that is QA’s responsibility.  Every programmer needs to be granted the ability to modify any of the code (with the appropriate amount of review as necessary), but granting this ability comes with the expectation that every programmer is responsible for the performance and quality of the codebase as a whole. 

One thing that I think fosters these qualities is to place the programmers closer to the customers.  When programmers are close to the customers, they can see tangible reactions to their product, allowing them to gain more pride in their work and a stronger sense of responsibility to the customer.  But in general, anything management can do to make developers proud of their company and the work they do, as well as make them feel responsible for the entire software product, should help motivate developers to continually clean their code.

April 1, 2009

Using Database_Default When Specifying SQL Server Collation

In SQL Server, collation refers to the set of rules that determine how data is compared and sorted.  Are you mindful of your database collation?  You should be.  Even if you don’t need to support multiple collations right now, it’s easier to choose a collation and establish your collation policy early on than to go back and update a bunch of database objects later.  Trust me.  We had to go back and apply consistent collation usage to all of our client databases, which required writing a script that dropped constraints, indexes, and views, disabled triggers, set the collation for the database, updated all character data type columns on all tables, and then recreated and re-enabled everything that was dropped or disabled.  Not fun.

Collation can be specified:

  • at the server level when installing or updating SQL Server;
  • when creating or altering a database;
  • when creating or altering table columns; or
  • when casting the collation of an expression. 

However, it’s often easiest to just specify the collation at the database level, and then have everything else in the database inherit that collation.  That way you have consistent collation usage throughout your database.

To do so, database objects (including tables, functions, procedures, views, and triggers) that have character data type columns (meaning char, nchar, varchar, nvarchar, text, and ntext columns) should specify a collation of DATABASE_DEFAULT. The DATABASE_DEFAULT option indicates that a column should use the default collation of the current user database. 

For example, create a database and specify its collation:

CREATE DATABASE [TestDatabase] COLLATE SQL_Latin1_General_CP1_CI_AI 

And then you can use the DATABASE_DEFAULT keyword when creating a character data type column to have that column inherit the database’s collation: 

CREATE TABLE [dbo].[TestTable]
(
[ID] [bigint] IDENTITY(1,1) NOT NULL,
[Col1] [char](10) COLLATE DATABASE_DEFAULT NOT NULL,
[Col2] [ntext] COLLATE DATABASE_DEFAULT NULL
)

Omitting collation declarations is not terribly significant for table columns, as these columns will use the default collation of the database.  Omitting collation declarations for temp tables and table variables, however, is significant, as the columns for these tables will then use the default collation of tempdb, which uses the default collation of the server – and the server might have a different default collation than the current database has.

Handling collation in a consistent manner throughout your database is important.  At work we even took the extra step of writing a Subversion hook that verifies every file being committed that has a table, stored procedure, view, etc. includes the “COLLATE DATABASE_DEFAULT” statement for all character data type columns.  While you may not find this extra step necessary, at the very least be sure to specify the collation wherever necessary.