At Testbirds we know we’re biased when it comes to testing. Good, comprehensive, and ongoing tests help ensure the quality, reliability, security, and performance of your software products.
For anyone who wants to be certain that their product isn’t broken, won’t crash or cause damage to their user’s device, is bug/glitch free, works as intended, doesn’t ruin the user’s experience, and can’t be exploited (for data theft, and more)… then, yes, testing is essential.
It’s also important if you’re looking to avoid extra costs, minimize risks, optimize processes, and gain customer trust. Without good testing, anything can happen.
But to understand why it’s so important, let’s consider a few cases where it all went wrong.
Whether from a lack of testing, inaccurate or incomplete testing. Internal politics. Shareholders. Costs. There are simply times when good testing – and not ignoring those tests – could have stopped many headaches (and even tragedies).
To test, or not to test?
There is no question. You must test.
Just look at Australia and cane toads. Introduced in 1935 to help control beetles on sugar cane plantations, 2400 were released, without any tests to determine if they could harm the environment or, amazingly, if they’d even eat the cane beetles. The problem? They breed a lot, have no native predators, release a poison, and… ultimately made no impact on the cane beetle population. They’re now classified as a feral species and increasing their range by 40+ miles every year.
Carmaker Tesla invests heavily in so-called automated driving with their ‘autopilot’ feature. However, there have already been numerous deaths since its release in 2015. While their software and hardware testing are surely robust, one thing that appears to have not been tested is the name.
The cars are not self-driving, they automate some driving tasks. The driver should always be attentive. The word ‘Autopilot’ however, implies full autonomy, and it’s clear that in at least one fatal accident the driver was not in the driver’s seat, and in another, was playing a game on their phone. Would a simple test about the word’s ‘implied meaning’ show that drivers see it as ‘fully automated’? If yes, a simple name change, which clearly showed it was not, could have saved lives (then and tomorrow).
Sadly, for some businesses, regardless of whether they test or not, or what those tests reveal, everything comes down to the cost vs. benefit factor.
The dangers when tests are ignored
Even when tests are done right, things can go wrong.
Take the Ford Pinto. Developed to compete against Japanese and German subcompact cars, the Pinto was rushed into production with a shorter than usual design/development schedule. This process included conducting crash tests on prototypes and the final design to ensure they met current safety standards. All crash tests to the rear of the car failed, resulting in ruptured fuel tanks, and leaking fuel.
Testing clearly showed a problem, and, as William Shaw and Vincent Barry noted in their book Moral Issues in Business, “Ford knew that the Pinto represented a serious fire hazard when struck from the rear, even in low-speed collisions.”
Ford’s management, however, approved construction and the Pinto was sold to the public from 1971 onwards. It all came down to their cost/benefit analysis.
Ford calculated that fixing the design flaw would cost $11 per car – an overall cost of $137 million. Payouts to those killed or injured by such collisions, on the other hand, were calculated at about $49 million. Estimates vary but it’s clear that at least 27 people died – with some saying, potentially hundreds.
The testing worked. It was Ford’s refusal to heed those tests and place profit over safety that was the issue.
For a similar look at how testing wasn’t properly heeded, consider NASA’s failure when it came to the flawed design of the Space Shuttle Challenger’s O-rings. All of which were tested years before the disaster and were shown to lose functionality under certain conditions (particularly when cold – as it was the day of the ill-fated launch).
When things don’t work as intended
Testing is also about ensuring that what you designed can be used by the user. While you may have smoothed out the bugs, ran through everything, seen it work flawlessly in-house, gotten signoff… once released, it failed.
This is what happened with Bungie’s video game, Destiny 2, and its (then) latest expansion ‘Season of the Forge’.
Before that though, let’s be honest, picking test failures in the gaming industry is like shooting fish in a barrel:
In 2007, RedOctane, the makers of Guitar Hero 2 released a patch for the Xbox 360. The problem? It crashed many consoles and, for a sizable number, permanently ruined (bricked) their machine.
In 2020, CD Project Red released their highly anticipated Cyberpunk 2077. The problem? System crashes, multiple bugs, scripting issues, graphic and audio problems, inferior performance on PCs, the list is extensive. The game was even removed from the Playstation Store and has been called one of the most disastrous launches in modern games history.
There’s actually just too many to list. For an interesting overview of some of the industry’s worst launches, click here. To see some of the other challenges the industry faces, read our article on diversity in gaming.
But back to Destiny and how a game-breaker isn’t always a coding problem.
Destiny is an online-only game that sells four Season Passes a year to keep the content ‘fresh’. The Season of the Forge revolved around using various forges to earn new weapons and armor.
The climax of the season was the ‘unlocking’ of the fourth and final, Bergusia Forge, which meant solving a complex seven-layered puzzle. No player could access the forge until the puzzle was solved by other players.
Sounds like a great idea to encourage team play and drive interest and engagement. Right?
Yes, if the puzzle was solvable. What followed was a textbook case of why you must thoroughly test your product for every contingency.
To solve it, your three-person team needed to stand in specific spots and shoot symbols with specific weapons gained from the other forges (you can’t see the symbols without the right weapon).
The problem was, the puzzles were increasingly, and incredibly, obscure with elements that had nothing to do with the game. Ultimately, it came down to knowing a fairly obscure Victor Hugo poem, the melody to Frère Jacques, and how gems on the hilt of Charlemagne’s sword, Joyeuse, were arranged (you can check out the ‘obvious’ cipher to that section here).
Who couldn’t quickly do that? Well, nobody, as it turned out. Players worked on the seventh level for over 24 hours straight with no luck.
Because of this, and clearly that it wasn’t going to be solved, Bungie stepped in and unlocked the forge. Two days later they even admitted that some text had been ‘improperly removed’ from the final hint of the puzzle… making it impossible to solve in any case.
This raises the issue of their testing. There’s no doubt they tested the puzzle in-house but it’s a good bet they did so knowing the answer, and only tested that ‘doing X would result in Y’.
It seems clear that they didn’t test if it was actually solvable, within any reasonable amount of time, by those not connected to the puzzle’s development. Then, when it was released, a crucial hint was missing, which wasn’t caught. This is why external third-party testing can often find issues that you can’t.
They look at your software without any preconceptions. The testers at Bungie clearly presumed it was possible because they knew it, it worked for them, and therefore the players must be able to get it. They were wrong. It’s also why they undoubtedly missed that part of the clue wasn’t there.
When a ‘simple fix’ breaks everything
For any business, it’s clearly best not to lock millions of your customers out of their accounts. Particularly banking accounts.
But that’s exactly what the UK’s TSB Bank did in 2018 during an IT migration to their new banking platform.
TSB started the migration of all records and accounts on a Friday afternoon, informing customer’s they couldn’t access anything until after 6 pm on Sunday. After 6 pm it became clear that the migration was in trouble.
People couldn’t log in to their accounts. Others received information from other people’s accounts. Some received inaccurate debits and credits. Two days later some 1.9 million customers had no access, and this continued for the next couple of weeks. Some customers couldn’t access their accounts up to one month after the upgrade.
In the end, it cost the bank $461 million and 80,000 customers.
An inquiry found that the massive failure occurred partly because only one of the two new data centers had been tested (the other reasons involved management decisions). This made it impossible to determine if there were any issues with the new system. Blame quickly fell to the new parent company’s IT services arm for cutting corners.
Additionally, a report from IBM, which was brought in to help solve the problem concluded, as noted by Reuters, that the testing was ‘relatively quick, did not show sufficient evidence of the system’s capacity and did not apply the criteria needed to prove it was ready’. Additionally, that ‘in such a risky operation, it would expect to see “world class design rigor” and “test discipline”’.
With all of the privacy issues involved, the money, potential for fraud, the damage to their reputation, and much more, it seems inconceivable that appropriate tests were not made, and that senior management did not follow up to ensure they had been.
But don’t worry, they’re not alone. In 2012, National Grid USA decided it was time to streamline their back-office processes and move to a new ERP system from SAP.
Everything that could go wrong with the implementation did and it eventually cost National Grid nearly 1 billion dollars.
There’s little doubt there wasn’t adequate project governance, risk mitigation, or controls in place, and later, National Grid USA took their systems integrator, Wipro, to court. One of their contentions was that Wipro failed to adequately test, detect, and inform of problems. You can read all about it here.
When your testing fails to launch
When you think NASA, you know testing is a huge priority. With hundreds of millions at stake for any given launch, you can’t make even the smallest error.
But they’ve made some pretty big ones (not forgetting the Challenger disaster).
In 1990, the $1.5 billion Hubble Space Telescope was launched. But after blurry images returned to Earth, it was clear its mirror had a flaw. Ultimately, NASA discovered it came from wrongly calibrated equipment during manufacture, which caused a one-millimeter (0.039 of an inch) aberration on the mirror. Yes, testing was an issue. As noted in a 1990 New Scientist article, the manufacturer “ignored warnings of the problem detected by a cruder test instrument” and while the primary and secondary mirrors were tested separately, “no one tested the complete telescope before launch.”
Surely, after such a small, yet expensive mistake, NASA had learned their lesson about testing?
Nope. In 1998, the $125 million Mars Climate Orbiter spacecraft was destroyed after a remarkably simple navigational error. Flight commands were sent in English units not metric, which caused the craft to fly too close to the Martian atmosphere and disintegrate. This begs the question, why weren’t test commands sent before and during the flight to help predict what would happen to the craft? Would they have then caught such a simple error?