Code Design Template For Apache POI Based Excel Writers

In this blog post I will be proposing a code design template (in Scala) that I have come up with in a very short but fulfilling experience working on a small project based on Apache POI.

If you have ever worked with a library like Apache POI, you might have experienced how soon your code turns into a spaghetti code. Though we all have our own ways to refactor and deal with such a mess, in this blog post I would like to share my personal journey. Though I don't claim this to be an ideal approach but IMO the code design template that I am going to present will definitely keep your design highly modular, maintainable and readable. Also, I would like to point it out up front that since I don't consider writing excels as a very resource intensive task, I have traded modularity, maintainability and readability over time complexity.

Consider your organization has a biometric scanner to allow only employees to get access through the office gates and now you also want to utilize the logs generated by the biometric scanner as employees attendance logs and subsequently generate an attendance report in excel format which should look something like this:


To achieve the same we can use a library called Apache POI. It is a great library for manipulating various file formats based upon the Office Open XML standards (OOXML) and Microsoft's OLE 2 Compound Document format (OLE2). It other words, it exposes Java APIs to read and write MS Excel files besides other formats.

We can basically divide the code design template in five reasonable steps when writing to excel files using Apache POI library:

1. Identify sections to write

The idea is to first and foremost identify all the independent sections that you would want to write. Please notice that though, for example, EmployeeInfo and Attendance fall on same set of rows and can in fact be written together; I would highly recommend to keep any such sections that involve different rendering logics, separate. This although as already mentioned in the intro of this blog compromises time complexity (and even the library seems to encourage otherwise), you would see further how much of this approach would have an influence on the modularity, maintainability and readability of the code.


2. Define dimensions of the sections identified

Once sections are identified, we need to define the dimensions of those sections. The objective here is to abstract the logic involved in determining the dimensions of the sections in one place instead of leaving it to leak across all section writers. The section writers could then be agnostic of how the dimensions of respective sections were determined and work directly on the dimensions instead.

3. Abstract styles for every section in separate traits

Next, respecting SRP design principle we can define styles for every section as shown below. Defining styles like so would allow to keep the presentation logic separate from the rendering logic. This also makes styles a pluggable feature. Writers could upgrade to new styles just by extending to another styles trait.

4. Implement writers for every section

You can now implement writers for every section extending to respective styles traits and utilizing dimensions predefined for every section.

5. Compose it all together

After defining all writers and respective styles, you can compose them all together in a single definition.

Another important thing here to notice is that you cannot override already defined rows, merged regions and similar components that Apache POI library facilitates you to create. This affects the order in which you can invoke section writers. However, I have addressed this issue in my solution by making sure that I don't attempt to recreate a row that's already defined like so:

The eventual structure of the module would look something like this:

This is where it ends. Hope I could put the idea across well enough!

You can find the complete working implementation of this utility here on github.

Show Comments