By ryan | Thu, 12/17/2015 - 00:56
When working on larger projects, software architects and tech leads inevitably come across a few big hurdles in the project requirements for some not so “out of the box” functionality. They start asking themselves questions about the best way to approach these tasks:
- Can I find a contrib module that I can leverage enough to accomplish this?
- Will that extra work be less effort, and less of a headache to maintain than just building exactly what I need?
- Should I just build a custom module that does exactly what I need?
Like all good Drupal lovers, we tend to roll the dice on the contrib module route because there are so many that are solidly built, and that provide a solid base if you do need to extend upon it.
My Struggle With Contrib Modules
Let me share a brief summary of one of my last projects, and sadly, my struggles with leveraging contrib modules.
Over the majority of the last year, I’ve been working on a project here at Cheeky Monkey Media that required a custom search tool. Once you have searched, you would see a breakdown of the totals of your search, and have the option to download the full datasets. The dataset was quite large, 650,000+ records with 30+ fields per record. Over the course of the project, we constructed, then reworked this process multiple times in order to increase speed, and decrease our memory footprint. Our search needed to look at two content types with a parent/child relationship between the two.
The First Attempt: Custom Search Tool
Our first, less successful approach, utilized Search API, with Apache SOLR and mysql database backends, as well as a more direct integration with Apache SOLR. When we did our initial build and testing, we were using a small subset of the data. We needed to migrate their old data, and were working on the search tasks in parallel with the migration tasks. Everything worked great on our subset, and everyone felt like when we switched over to the fully migrated database, our speeds would still be in acceptable levels.
Unfortunately, we were wrong. Although, after a little tweaking, our initial search results and breakdown was returned in a timely manner; none of these implementations allowed us to generate the full results, both quickly and without generating huge memory footprints which could crash the file generation process. The results were offered in downloadable PDF and Excel files, and with working with these contrib modules; finding a balance between the memory footprint of loading in the data needed to write the files with the ever increasing file size of the download with generating the files in an acceptable time frame just could never be met.
Final Solution That Worked
Our final solution ended up utilizing a custom database queries against two search index tables, one for each content type. This allowed us to streamline our whole approach, eliminating some extra steps added from how we could interface with the contrib modules. And with being a custom solution, we were able to analyze our queries and set up indexes on the tables, to further optimize as needed. We were also able to make more minor adjustments to the how we generated our initial total breakdowns - creating very precise queries instead of having to use a combination of multiple queries and record iteration to compile our needed breakdowns.
Conclusion: Not Scalable Enough
In the end, the choice to go with the contrib modules ended up biting us in the ass. Even though they were pretty solid modules, there was just too much “extra”, too much rigidity added from the contribs. Being a smaller dev shop, most of our projects are on the smaller size. We sometimes forget how sloppy, inefficient, or even superfluous code, whether from a contrib module you’ve extended upon, or even your own custom code, can become a giant pain when scaled up, even just 2-3 times. And all that extra space around that square peg in the round hole is no longer just negligible, it’s now made things unusable.
In the end, I’m not trying to say that when you scale things up, a contrib module that needs a little extra work is going to kill your processes, or that contrib modules just don’t scale. I’m hoping to remind everyone to keep in mind there are tradeoffs of choosing a slightly imperfect solution, or slopping together a makeshift solution. That even when working on smaller projects, when the few extra milliseconds here and there doesn't really matter, we become better architects and programmers by remembering they do add up. That we can always be honing our skills and staying sharp for when they are quite noticeable.