There are certain things we know we need to provide inside our platform and as part of the service we offer. We know those because they’re required to meet standard expectations, because they help us solve the problems we’ve determined are key for clients and the solution is understood, because clients have told us in all of the ways we collect this feedback and because it’s a key part of our larger strategy.
But as a small team who wishes to make a big impact we also believe in running important experiments to trial things. To see what works, what we need more research on, what fails (and why it did) and what we need to go deeper on for full implementation. Today we’re sharing how we run these experiments at Creatively Squared.
These tests are developed from data analysis of current stats, user flows and behaviors, team strategy sessions, conversations with customers and insights from other industries, services and technologies. Sometimes they come from doing a design or innovation sprint or posing a question to a problem you’re experiencing. We also have an experiment brief that any team member can use to pop into our product feedback channel on Slack.
One of our recent experiments came from clients asking if we can get assets to them quicker, and our Project team asking, how might we do this faster?
We’re not the only ones who do this of course. Experiments are a low cost way to trial ideas you need to get data on. Research is great but it only takes you so far. Is someone actually willing to put down the money for that new feature for example? If customers don’t recognize what their specific problem is, if you put the solution in front of them the right way, will they use it? These experiments allow you to get to the answer quickly and add it to the rest of your knowledge. Instagram for example has been trialling hiding the ‘likes’ feature by disabling it in certain regions. They’re likely checking whether engagement changes, if people post more or less, and anecdotal evidence of people feeling less pressure and angst when posting. In a new update, they’ve expanded this experiment by testing an option that lets some users choose whether they want to see likes and turn theirs off.
The other reason experiments are so important to us is because it’s what we encourage our customers to do. We create content at scale and that content gets better and better the more data we have on it. We’re constantly seeking to answer what style works best in that market segment? Is it different in another segment? Is it more important to sell the product itself or the outcome of using it? Is there a certain audience that responds better to X, Y or Z? Is it better to run a campaign with 40 different ad visuals or do a few stop motions work better? Can we reach more people at a lower cost with short, authentic videos?
We keep things as simple, quick and effective as we can. We’re all about Minimum Loveable Tests (MLTs) but that still means we need to plan. Accepted experiment ideas are logged in a Notion database as a hypothesis with the manner we’re going to validate it and know it was a success. It’s important that your hypotheses provide a clear metric of what you’re looking for in order to know how to proceed. Be realistic with these rather than living in an ideal world.
There are a number of ways to write hypotheses and these change company to company but here I’ve gone with a We believe statement which is popular in assumption testing, followed by the experiment and validation. If / then statements are another method and useful too.
Ultimately you should be able to easily articulate the change you are making and what you expect that change to make. A hypothesis can be as small as a new button to a bigger idea such as testing the demand for a service.
An experiment log keeps an easy track of what you’re testing and a clear historical record of everything you’ve tried and the outcome of it. Make sure you’re logging all your tests!
You may find that for some questions you’ve posed you end up with a hypothesis tree. You have a lot of hypotheses and a lot of tests you can run on each of them. Depending on the resources available to you and how many people will be interacting with the test, you’ll need to hone into the strongest 1 to 3 hypotheses. You should have *some* knowledge already so will need to use that to make the decision. Listen to your customers and users closely!
We chat about the experiment we’re pursuing with all the necessary team members. When it has been fleshed out, it is confirmed by other relevant team leads to make sure we haven’t missed an angle. You might find that some people find this a little confrontational which is understandable. Make it fun and not attached to being right or wrong. We’re just testing ideas and theories to move the business forward. We may even be testing ideas we think should prove false but we haven’t been able to confirm and it’s a key assumption in other strategies. As a product lead, pop your thoughts forward to reduce anxieties where possible and remind your colleagues this is just a test and we take collective action from this; it isn’t a permanent state.
When it’s approved to go ahead it gets a quick mock up in Figma of how it will look and the process flow. Our team is small and though our client base includes some of the biggest companies in the world, we’re not serving thousands of different organizations each day so our experiments are meaningful and significant. We’re not testing whether the position or color of a button has a material impact on our conversion rate for example. We cannot afford to be wasting time running experiments for their sake only, or spending much needed development and design resources on irrelevant tests. Many of our experiments relate instead to our value proposition.
Hint: this is a fast process if you have an updated component library in place!
Although the idea is to do these quickly, they still require thought on how exactly this will affect users and creating a test that doesn’t bias the results. If your implementation makes the user go through an awful process, is highly confusing, or they can’t notice the update, you’re likely to end up with more questions than answers. Make sure too that you’re not running multiple experiments that will create different variables to the outcome you’re testing so you don’t know what the cause of the change was.
With the initial mocks it goes out to the people involved for them to review. For small experiments this is done inside Slack but for larger ones, especially those that may have a significant impact, I find it best to do these over a 20-30 minute video screen sharing chat to walk through it together. I’ve always received important feedback and tweaks from this. A number of experiments will require customer or usability testing too so pop together an interactive prototype and get it in front of them.
Once you’ve made the tweaks and run through it further with the Product team for any implications, the experiment is ready to run. We keep our experiments off the back-end as much as possible so that they are front-end additions only (this reduces workload and keeps it as simple as it should be, but is also important for maintaining data accuracy). At times we also buy a solution (i.e. an SDK) or use an open source package to test before we decide to ever design and build something specific.
Lastly, I’ll jump in and outline the goals and requirements of the test including the data we need to capture.
Once that’s done and the responsible front-end engineer has confirmed the spec, the experiment board is updated, the tasks are added for allocation and the test status is set to waiting for code.
During sprint planning we allocate the experiment tasks. Once they’ve been coded locally they go through our testing and deployment process like any other task. We do weekly releases so unless it’s an urgent item this works for us. In other teams you may find you need to run experiments outside of your standard cycles in order to be effective.
When the item has been released we alert any impacted team members and note it in our internal release notes (this also comes with a Slack notification and asterix that this is a test item). The status on the experiment log is updated and the testing time frame added. The time period should be determined depending on the inputs; how many people use your platform each week for example? As we’re largely enterprise based and our platform isn’t a daily tool, we need more time for platform related tests than you may in a traditional SAAS product.
Some tests aren’t directly related to any accepted epics on the roadmap but they align with OKRs, are potential solutions to problem statements, or are new ideas we want to trial. Other tests are related to epics in the exploration phase. For these, we denote the epic with a unicorn emoji so at a quick glance we know we’re testing a part of that for research.
We choose to keep our experiment log separate from our roadmap for these reasons. It also makes for an easy board the team can refer to to know what tests are in play and upcoming.
You need statistically valid data. Sometimes you might already have more than enough data well before you intended to end the test. This doesn’t mean you need to stop it (it totally depends on what exactly you’re testing as it may be very beneficial to keep going) but you’re able to act on it earlier. Otherwise, make sure you track how the data is going. You may also find you need to add a little tweak if you spot something obvious. It depends on what you’re testing. You may be doing an A/B split test (usually provides quicker results) or a multivariate test. You may be testing demand or the willingness of customers for a specific benefit. There’s often a difference between what customers think or say they want, and what their behaviors actually show.
Should you tell customers you’re running a test?
It might make sense to point out it’s a Beta service for example which means it may not be around forever and doesn’t set that expectation. If it’s a painted door experiment you will be letting your customers know too. That is; an experiment where you haven’t built anything that would make it work. For example we’re currently testing the desire and ability of enterprise customers to purchase extended licensing within the platform. We’re monitoring whether this is worth rolling out and the ability to pay through the platform through making everything appear functional but once you get to the process you’re notified this doesn’t yet exist and that we’re gauging interest (whilst in the meantime here’s the standard option - with gratitude). You want to make sure you’re not frustrating your customers. In fact if you do this right, you’re hopefully at least slightly smoothing an experience or even exciting or delighting them.
One of the best examples of this type of experiment is the Hotel Tonight app which made everything look beautiful for customers and allowed them to test whether this MVP was something both customers and hotels would use. The backend involved manually ringing around for rooms when users had booked, sometimes for hotels that weren’t even aware they were on the platform.
Analyze and communicate
You’ll need a variety of tools for the right test. You might set up events in Google Analytics or be tracking through Optimizely. You may be sending data to Airtable, monitoring in Amplitude or tracking in an internal dashboard tool. Determine the best option for each test and then keep an eye on it.
When you have enough data or the time period has ended, I use these four questions to create action forward.
I find number three is often overlooked. Did we negatively impact total revenue by implementing this? Are we harming a segment of people with this even whilst it helped others?
Then you need to communicate the results and see if others have input on it too.
Once you conclude the test you can determine your next steps. Combined with your other research and evidence you’ll be able to make good decisions whilst remaining nimble. Usually that results in a new experiment, a new epic, tasks added to your backlog, an item reprioritized, or a strategy session.
A final note: sometimes it can feel disappointing if the test didn’t work out how you hoped. Remember you’re effectively acting like a scientist here proving or disproving a theory. Knowing either way is a major step forward and not implementing things in your product that shouldn’t be there, or knowing what not to do, is also a big win for everyone; your team and customers included.
And that’s how we test!
Get access to paid work opportunities with global brands. Register your interest by sharing some examples of your work.
Discover Taeona's transformative journey from portrait to product photography, and how she elevates her content to the next level
Learn how talented creator and architect Rachelle Co inspires us with her beautifully clean, colourful, and crisp product photos