When the concept of TestBox was first pitched to me, I was immediately excited. I had been in the shoes of a software buyer before, trying to figure out the right software stack for my brand new business. There are so many options to choose from! As Sam explained the concept to me, I knew immediately it was going to be an interesting technical challenge. Among a variety of problems, our two most notable were:
- How to achieve the founders’ vision of “wrapping” a product with a “test harness” to help guide buyers?
- How do we get our hands on “trials” to provide to our users?
This is my fourth startup, and one of the earliest I’ve joined. I can say with some confidence that most startups start out building a monolithic application. After all, the common wisdom is to move fast, iterate quickly, and find product-market fit. As we sat with the complexity of the road ahead of us though, we decided to take a more thoughtful approach to designing our first version of TestBox.
The mandates of iFrames
It’s no real secret that TestBox utilizes iFrames to demonstrate software to users. Unfortunately, using iFrames in a cross-domain way comes with some baggage.
Our initial solution to this problem was to build a proxy for our partners. It would allow us to iFrame their site while allowing them to maintain best security practices for their customers. Thanks to the stateless nature of HTTP, it made complete sense to us to build our proxy independently from the rest of our application. So right from the get-go, we had a minimum of two services.
iFrames also inspired another aspect of our system…
Opting in to event-driven, sort of
Communication with a cross-domain iFrame is only possible via one method: messaging. In the earliest days of TestBox, we weren’t really sure how or if this would restrict us or dictate the way we built our “test harness.”
So we decided to embrace this philosophy completely. We felt it would make it easier on our engineers to think about the entire system as event-based rather than just one key component (the iFrames).
Carving up the turkey
At this point, we had (at least) three major things to start building:
- A proxy to assist us in iFraming our partners.
- An experience for our users to try out products, learn about them, and make a buying decision.
- A way for us to provision trials for users.
With three engineers, and three things to build, we stumbled into a very natural plan of action. You may notice though, there isn’t a bullet point here for “building an event driven system.” Put another way, “building our platform.” More on that later.
How many partners will we have?
In the earliest weeks of TestBox, we built a monolithic service to solve the problem of acquiring trials. We wrote all of our code for acquiring trials from our earliest partners (Zendesk, Freshdesk, and HubSpot) in one service. In addition to having all our business logic in one place, we were also using SDKs provided by these partners to help us build out the integrations.
One day, a misconfiguration caused us to temporarily lose the ability to provision trials for all three partners. This led to an abrupt reality check and evaluation of our situation. For starters, obviously a system that is capable of being misconfigured with such consequences is not ideal.
Unrelated to the outage, but relevant to our services decision, was dependency management. Even with only three partners, the project’s code dependencies were becoming unwieldy, largely due to the sub-dependencies of each SDK. We were not excited about the idea of adding a 10th SDK to this monolithic code base and dealing with even more version compatibility mismatches.
As such, we decided to break down this early monolith very quickly into additional services.
“Monoliths forever!”
Like I mentioned earlier, this is my fourth startup. At every single previous experience, the monoliths grew out of control. All of them struggled with scaling and performance as they grew up from Series A to Series B, never able to free up the engineering resources to properly decompose their monolith into services to improve their users’ or developers’ experiences.
I’ve already mentioned how invested in the idea of TestBox I was at first pitch from Sam. As such, my bias kicked in. I thought we would certainly hit a rapid scaling point quickly. It seemed like a good idea to try to prepare for that “inflection point” when suddenly traffic spiked and everyone was excited to use TestBox!
One piece of technical advice I received at Pluralsight was simple but powerful — be prepared for a 10x increase in traffic. TestBox hasn’t quite hit that 10x traffic mark yet, but we did see an uptick in traffic when we launched on Product Hunt. The advice worked, and we didn't have any issues during our launch. We continue to keep this adage in mind as we grow.
We’ve also found that our developer experience is generally better with services than trying to run the kitchen sink locally. Devs can check out, run, and test only what they need to work on.
A summary of our reasons
- Building a proxy as a separate service made perfect sense, so we knew we’d have to contend with at least two services right away.
- Because of the security models of iFrames, our engineers needed to think in an event-driven model. Making everything event-driven “felt” natural.
- We were afraid of trapping ourselves in a monolithic scaling situation for our main application.
- We saw a foreshadowing of technical issues with dependency management as our partner count grew.
- It made division of labor, even in a small team, extremely easy. Instead of dealing with Git merge conflicts or Django migration conflicts, we were able to iterate quickly.
A closing thought: You’ll also note that at no point in this article did I use the word “microservice.” We try to build “right-sized services” — services that accomplish a business objective no matter the size.
Would I do it again? Should you?
At TestBox? Yes. Perhaps not to the degree we have taken it. There is no doubt that our proxy handles an order of magnitude more traffic than our main application (even static assets have to run through the proxy). This means it requires more scale and possibly even a different tech stack altogether (Rust, Go, or async Node all seem better fits than Python, for example).
At other startups? I would think twice. Without a proper strategy for managing shared state, data, and events, your services will suffer. I generally agree with some Internet advice I read that services cause you to trade one type of complexity for another. Your operational complexity will go up — and that is probably not something you want in the earliest days of your startup. We are still sorting out how to best manage this complexity today at TestBox.
Every technical need is different! For a lot of reasons, services were the right path for us — even for an early-stage startup. Hopefully, our story will help guide you in your decision making for your technical planning and approach.