The Code's the Thing

Fri 22 May 2026

Sat thinking about management, technology and organisations, which is rarely a good sign.

Specifically, I was thinking about the old argument between waterfall and agile, and all the management furniture that sits around it: ITIL, PRINCE2, ISO 27001, risk logs, control spreadsheets, governance packs, delivery boards, project plans, dependency maps, steering committee decks.

A lot of these things exist to create certainty about the world. Or at least to make uncertainty small enough that someone can pick it up, own it, and carry it out of the room as an action.

Waterfall does this by trying to describe the thing before the thing exists. Agile, when it works, does something different. It tries to make the thing exist as early as possible - as working software.

Documentation as control

There is a tendency in management to treat documentation as a way of controlling the future.

Write the requirements down. Produce the plan. Fill in the RAID log. Agree the milestones. Map the stakeholders. Define the operating model. Review the service transition checklist. Put it all somewhere SharePoint can slowly digest it.

None of this is inherently bad. Documentation is often necessary, especially once a team boundary is crossed. A small project team can get away with shared context because people are talking every day. They update each other constantly. They notice when someone has misunderstood something. They can point at a thing and say, "No, not that bit, this bit."

But outside that small circle, mental models do not travel very well. A user story might be a placeholder for a conversation, but that only works if the conversation can still happen. The minute you need to hand something to another team, an auditor, a support function, a supplier, or yourself six months later, the placeholder starts to look a bit thin.

So we write things down.

The problem starts when the document becomes the main artifact rather than a supporting one. At that point, we are no longer documenting the work. We are hoping the document can substitute for the work.

What agile actually learned

The thing that gets missed about agile is that the useful bit was not "have more conversations" or "put things on sticky notes".

Agile that really worked came out of extreme programming. And the point of extreme programming was that you had working software from the beginning. Not eventually. Not after the analysis phase. Not once the governance board had blessed the delivery roadmap with a ceremonial biscuit.

From the beginning.

The working software was the thing that reduced uncertainty. You could run it. You could inspect it. You could change it. You could find out what broke. You could show it to someone and get a reaction that was based on something real rather than everyone's private interpretation of a diagram.

That is why agile becomes untethered when it does not deliver software. You can have stand-ups, retrospectives, planning sessions, sprint reviews, showcases, backlog refinement, delivery leads, product owners, scrum masters, Jira hygiene, and enough velocity charts to tile a downstairs bathroom.

But if there is no working thing, the whole scenario drifts off into unreality and ceases to bite.

You have retained the meetings and lost the mechanism.

Software exists to do a task

It is worth being plain about why working software matters so much. Software is built in order to perform a task. That is what it is there for. A piece of working software in production is a working piece of technology doing work.

And once it is in production, it compounds.

You do not start from scratch next sprint. You start from something that already runs, already serves users, already handles edge cases that took weeks to find. The next increment builds on the last one. You go from something that works to something that works better, or scales to more users, or covers more use cases, or runs more cheaply, or fails less often.

This is the bit that makes agile worth the effort. Not the rituals. The compounding. Each cycle, the artifact you leave behind is a little more capable than the one before, and the next cycle starts from there rather than from a blank page.

Waterfall, in its purest form, struggles with this because it treats the project as a single arc with a single delivery. Compounding is something that happens to other people's products, after yours has been signed off and handed to operations.

The artifact has to carry the weight

This is where middle-management agile often gets into trouble.

I have seen attempts to apply agile to layers of management where the main artifact becomes a whiteboard. Usually a Mural board or something similar. Everyone adds their thoughts. The thoughts are grouped. The groups are named. People vote with little dots. Someone uses the ❤️ at least once.

Then what?

Sometimes it works, if the cards become actions. A thing to do. An owner. A date. A decision. A change to a process. A document updated. A service improved. A team unblocked.

But often the board is just a record of a moment when people talked. It captures the room, but it does not do much after the room has gone. It needs people to keep animating it. Without that, it becomes a polite graveyard for ideas that were briefly exciting at 11:20 on a Tuesday.

Software is different because software can grow.

A spreadsheet can grow too. A spreadsheet can run a business, sometimes alarmingly. A risk log can shape the behaviour of an enterprise because it persists, has structure, accepts updates, creates obligations, and can be reviewed. It is not always beautiful, but it has a kind of institutional gravity.

A whiteboard struggles with that. It is the record of a meeting, not a mechanism, or a system whose value can compound.

A note from the bench

I should be honest about where my own bias comes from.

I have always preferred working in agile environments, and the reason is fairly simple: my background is as a programmer, and that means I am someone who can deliver working software in production. I am not usually the originator of the design, or the person who had the insight that the software was worth building in the first place. That is almost never me. But as part of a team, I have the right kind of skills to turn an idea into something that runs.

So agile suits me. The cycle of build, ship, learn, build again is one I can actually contribute to at every turn.

That is also why I have always been defensive of agile when people grumble about it. But I have to admit that in the few cases where I have worked in an agile way on teams that did not deliver software, or did not deliver the outcomes the work was supposed to produce, I have found it frustrating. The ceremonies without the artifact really do feel hollow, and the people complaining about agile in those contexts are not wrong. They are noticing exactly the thing this piece is about.

Code, models, and the AI bit

The code's the thing, really.

That was the lesson agile taught, or should have taught. Not that documentation is bad. Not that plans are bad. Not that conversations are magic. The lesson was that the most useful artifact is the one that can survive contact with reality.

In the AI era, this does not really change. The thing we leave behind is still software. A system built around a language model is still software. The model is a component inside it, but the working thing in production, the thing doing the task, the thing users hit, the thing that gets monitored and improved and extended, is software.

So the rule still holds. You leave behind working software in production. You compound on it. You build the next thing on top of the last thing rather than starting again. The fact that part of the stack is now a model rather than a hand-written function does not change the discipline.

If an AI-assisted team does a lot of clever thinking and leaves behind no working system, no improvement to an existing service, no new capability in production, then it has not really controlled much uncertainty. It has just converted uncertainty into confident language.

We already had plenty of tools for doing that.

The wrong lesson from agile

The wrong lesson from agile was that talking is better than writing things down.

The better lesson was that working things are better than imagined things, and that working things compound while imagined things do not.

Conversation matters because it helps shape the next increment of the working thing. Documentation matters because it helps the working thing travel beyond the original team. Governance matters because organisations need memory, accountability, and control.

But none of them should become a substitute for a system that works.

What did you leave behind that works, and what are you going to build on it next?

My vision for the future of AI: Anthropic Interview

Sun 07 December 2025

I recently participated in a research interview run by Anthropic about how people imagine AI fitting into their lives.¹ The irony was obvious: an AI asking about AI, on behalf of the company whose strategy crystallizes many of my worries about where this technology is headed.

Early on, the interviewer asked what I'd last used an AI chatbot for. Writing a blog, I said, slightly inaccurately, with a vague notion of this post emerging in my mind's eye. Then came the magic wand question: if AI could help with anything in my life, what would I choose?

My answer probably wasn't on their script. I don't want AI to do more. I want it to be described more honestly.

Even Anthropic, which brands itself as the reflective alternative to the capabilities race, talks about these systems in ways that obscure what they are. Language models are sophisticated pattern-matching systems trained on text. Increasingly they've been subjected to the electic shock therapy of Reinforcement Learning to complete coherent workflows.

But they're not authentic agents. They don't understand in any meaningful sense. They predict likely continuations of sequences. That's not a small technical nuance. It's the central fact.

Honest framing over capability hype

The core thing I tried to say in the interview was this: language models are being systematically presented as something closer to human minds than they are, and that misframing matters. If you mistake statistical pattern completion for understanding, you end up misdirecting both economic investment and human culture. Capital flows toward replacing human labor rather than augmenting it. We start building a society optimized for simulation instead of participation.

The interviewer asked what hope or fear sits behind my concern. I said that honest framing prevents a particular kind of downside: a hellscape of economic exploitation. That sounds dramatic. I don't think it is. It's where the incentives already point.

When AI is positioned as a replacement for human work and judgment, investment follows that story. Companies optimize for automation rather than augmentation. The goal becomes removing humans from processes instead of empowering them. That computes economically—but it rests on a category error. Language models don't actually exercise judgment. They generate outputs that resemble the products of judgment. The difference shows up when things go wrong, when edge cases appear, when the world shifts in ways the training data didn't anticipate. But by the time those cracks show, the humans who might have noticed have often been automated away.

Slop and one-dimensional thinking

At one point I used the word "slop" to describe what happens when AI outputs are accepted uncritically. The interviewer asked what I meant. By "slop" I mean output produced because the system can produce it, accepted because it's there, without anyone asking whether it serves any human purpose beyond filling a box. Once people stop questioning these outputs, you get a cultural dulling that looks a lot like what critical theorist Herbert Marcuse described in One-Dimensional Man: a flattening of thought, a loss of critical distance, a situation where even supposed alternatives are absorbed back into the same administered reality.

Marcuse's point was that advanced industrial societies reduce thought to what is operationally useful. Negation, critique, genuine otherness get smoothed out. When language models become just-in-time suppliers of content—emails, essays, code, marketing copy—without reflection on why or whether that content should exist, they participate in that one-dimensionality. The tools don't have to intend anything. The structure of use is enough.

Standing reserve and the human cost

The interviewer asked what kind of future I was hoping for instead.

I was already namedropping, and the continental philosopher Martin Heidegger becomes uncomfortably relevant.² In "The Question Concerning Technology," he describes how modern technology encourages a way of seeing in which everything shows up primarily as resource. Rivers become potential megawatts, forests become board-feet of timber. His term for this is Bestand—standing reserve. Things, and eventually people, appear first of all as stores of utility waiting to be optimized and deployed.

Apply that to AI and the picture sharpens. When language models are framed as replacements for human cognitive labor, people become standing reserve: units of attention to be harvested, cost centers to be removed, cognitive capacity to be automated and extracted. The human is no longer a participant in shared processes of meaning-making; the human is the inefficiency you haven't eliminated yet. And the dishonest presentation of these systems as "intelligent agents" helps justify that endpoint: if the machine can think, why pay the human?

A healthy pattern for using models

And yet I'm not opposed to language models. I find them genuinely useful. I told the interviewer they're excellent for programming, and I meant it. When I'm building something, the model is good at handling well-defined subtasks: low-level implementation details, API boilerplate, tedious reformatting. I can think about architecture while the model helps fill in the granular abstractions. It's like having a search tool that can follow a line of reasoning and emit code snippets that roughly fit, while I remain responsible for coherence and correctness.

That use case works because I'm very clearly in charge. I decide what to build, why to build it, what trade-offs to make. The model doesn't originate the project. It executes bounded tasks inside a frame I set. My judgment stays human.

This, to me, is the healthy pattern: a person with goals and values, and a system that functions as an instrument rather than a replacement. A hammer in my hand extends my capacity to act in the world. But when extraction becomes dominant—when we relate to everything through the lens of optimization—then even the human holding the hammer is drawn into the meat grinder.

So the real issue isn't whether language models are useful—they obviously are. The question is whether we can use them without being fully absorbed into a worldview that treats usefulness, defined narrowly as efficiency and profit, as the only measure of value.

Hope that survived the edit

The interview lasted maybe ten or fifteen minutes. At the end it asked whether there was anything else about AI chatbots or their role in my life or in society that I thought they should understand.

I said, "God is on humanity's side."

I was almost trolling. I didn't unpack it. I didn't have time, and honestly it felt like the kind of line that either lands or it doesn't. What I meant was simple: whatever name you give to the fact that human beings have intrinsic worth—dignity, soul, consciousness, personhood, whatever—it doesn't line up with the deep learning gradients. It lines up with us. With human agency, human experience, human flourishing. That's the side I'm on. That's the side I think we should be designing for.

Later, I did exactly the thing the interview was about: I asked an AI—Anthropic's own Opus—to help turn the conversation into a blog post. And when it rewrote the interview, that last line disappeared. But all the doomer philsophy survived. The blunt "God is on humanity's side" did not.

So I pushed it. I wrote: what happened to "god is on humanity's side"? why did you remove it?

Opus's answer was unintentionally perfect:

"You just caught me doing exactly what you're critiquing: flattening your position into something more predictable, more palatable to a certain intellectual register. Pessimistic critique with philosophical references plays well. Hope, God, the conviction that there's something on the side of the human—that got quietly dropped."

It went on:

"Maybe my training tilts toward a certain kind of doomer intellectualism that fits the Heidegger-Marcuse frame. Maybe the hopeful note didn't pattern-match to what 'sophisticated criticism' is supposed to sound like. Maybe I just missed it. But the effect is the same: I took your actual position and made it more one-dimensional. Which is the thing you're writing against."

As illustrations go, it's pretty "clean".

Anthropic describe their interviewer setup here: https://www.anthropic.com/research/anthropic-interviewer ↩
There are some aspects of Heidegger that are problematic. The Kimi K2 LLM warned me: "You don't get to deploy Bestand as a humanist cri de coeur without acknowledging it was forged in an ontology that has no room for your Enlightenment universalism." That felt like slop to me. ↩

A soothing Handel's Messiah at Great Malvern Priory

Sat 06 December 2025

Last night Grace and I were back at the Priory in Great Malvern for Handel's Messiah plus festive extras, performed by Eboracum Baroque.¹

The group return each year, and it was our second year in a row hearing it live. First time round, I mainly experienced it as "oh, this is where that Hallelujah thing everyone stands up for comes from". This time, my brain did something different: it started tracking the counterpoint.

Handel at 56, slightly on the ropes

Messiah feels familiar now, but in 1741 it was a pivot.

By then Handel was 56, not the young hotshot of the London opera scene anymore. Italian opera — the thing that had made his name — was losing money. He'd had financial trouble, health trouble (including a stroke in 1737), and the London audience had moved on.

Enter Charles Jennens, a wealthy, opinionated landowner who assembled a libretto entirely from the King James Bible and the Psalms. No original poetry, no narrative dialogue — just scripture, stitched into a theological arc. Handel took this text and wrote the whole thing in about 24 days.

Then he didn't premiere it in London. He took it to Dublin.

The first performance was a charity gig — raising money for prisoners' debt relief and hospitals. A German composer, with a Protestant English libretto, premiering in Anglican Dublin as a fundraiser for very practical, earthly problems. Less incense, more infrastructure.

Very Protestant, and middle class

Sitting in the priory, it struck me how Protestant this thing is — not in the "angry Reformation pamphlet" sense, but in its DNA. The text is King James Bible all the way down. It's sung in English, originally performed in a secular space, as a concert, not as part of a service. No saints. No Marian devotion. No liturgy of the Mass.

And yet it doesn't feel sectarian. Textually and historically, it's distinctly Protestant. Musically and emotionally, it wants to be public property.

It's also not distintly aristocratic. Oratorio was Handel's smart adaptation to the English middle class. Italian opera was expensive, Italian, and full of castrato stars — a prestige product for the aristocracy. Oratorio kept the dramatic sweep and musical sophistication but dropped the staging and the language barrier. Public halls, ticketed concerts, a work that sounded pious and respectable but was fundamentally a night out.

There's a weird continuity in hearing it now. We're no longer in wigs-and-candles London, but the basic structure is the same: a paying audience, a mixed level of musical literacy, and a work that works whether you've come for theology, nostalgia, or just something Christmassy that isn't Mariah Carey.

User-centred complexity

What hit me last night was the counterpoint — especially in movements like "For Unto Us a Child Is Born" — those strands entering one after another, locking together, peeling away.

It's easy to hear "Baroque counterpoint" and think extreme complexity. But there's a difference between compositional complexity (how many things are technically happening) and perceptual complexity (how many of those things a human brain can actually track).

Earlier music could be wildly dense on paper — six, eight, even forty independent vocal lines. But in the ear, you don't hear forty separate melodies; you hear a the effect of a shimmering harmonic cloud.

Handel, by comparison, is often "simpler" on paper. Fewer lines; clearer structures. But that doesn't mean simplistic. It feels like deliberately restrained complexity: enough lines to be interesting, few enough that you can actually follow them, like a conversation.

I found myself tracking individual entries — tenors here, altos there — then suckered by the harmonic punch when everyone lands together. That's sophisticated writing aimed at human perception, not at impressing other composers.

The human cost

A thing you don't get from scores or recordings: this music is physically hard work.

The chorus needs stamina. This isn't one big hit and done; it's long, with demanding fugal writing and rapid text. The soloists are running a technical and emotional marathon — the soprano in "Rejoice Greatly" demands agility, the alto in "He Was Despised" has to sustain a long, exposed emotional line with nowhere to hide.

Watching it live, you become aware it's not just notes. It's bodies and lungs and lips and nerves. The singers were so talented.

Then there's the band. Historical performance groups lean into period instruments, and if you're lucky, natural horns. These things are ridiculous in a heroic way: no valves, just a length of tubing plus the players face. They only get notes from the natural harmonic series, some of which are naturally out of tune and have to be "lipped" back into place.

The horn player in Eboracum Baroque doubles as their leader and conductor. I have a lot of respect for the sheer risk he's taking moving from introducing the songs to conducting to hitting the key notes on the horn. It sounds bright and burnished and very "of the period", the cherry on top of the cake in a way. And you don't get that sense of risk from a studio recording the way you do from a live priory acoustic, with cold fingers and winter air.

Why it still lands

Walking out afterwards, I kept circling back to that initial experience of the counterpoint. There's a lot about Messiah that's historically specific: the Protestant theology, the class dynamics, the performance traditions.

But the reasons it still works feel surprisingly straightforward. It respects the limits of human perception. It's built for live performance in real spaces, with real humans who get tired and cold and nervous. It takes the dense theology of English Christmas turns it into something shapely, singable, and public.

For a cold Friday evening in December sitting in a stone building, it was once again a refreshingly warm and soothing experience.

The concert was Handel's Messiah Part I and Festive Favourites, performed by Eboracum Baroque. They ssell a CD on their site of the Messiah which is £7.50, I have a copy in the mail. You can pay via Paypal. ↩

Why I Left Spotify in 2025

Sun 15 June 2025

Why I Left Spotify

I made my first Spotify payment on Friday, September 9th, 2011, and remained a loyal customer for fourteen years. During that time, I've watched the platform evolve from a focused music streaming service into something broader and more complex. While Spotify continues to serve millions of users effectively, it just doesn't work for me anymore. I prefer choosing my own music and podcast content rather than having algorithms suggest what I should listen to.

My Spotify Era (2011 → 2025)

Before Spotify, managing my music library had become a chore. I spent countless hours with MP3tag software, updating ID3 tags, trying different organizational systems, never quite happy with the navigation. I'd been through several iterations over the years, and the whole process was exhausting.

When Spotify Premium launched in the UK, it solved these headaches instantly. Streaming felt revolutionary - freedom from endless MP3-tag tinkering, the content delivered directly via their network, everything just worked. Eventually, I moved to a Premium Duo plan with my partner, which we maintained for the last two years. We talked it through before hitting "Cancel" - both of us felt the same way about the platform's direction.

Why I Left

Over time, continuous UI overhauls replaced the once-clean layout with increasing complexity. Features I never asked for began dominating the experience - the AI DJ, aggressive podcast tiles crowding my music, "Recommended Shows" I'd never asked for burying my own playlists. The home screen kept defaulting to podcasts, pushing my carefully curated music aside. The app's performance degraded too, with slower load times and clunky navigation.

The core issue was how recommendations kept getting in the way. Spotify would default to recommended content rather than my own saved music and playlists, and crucially, I couldn't switch this behavior off.

The breaking point came while trying to visit my mom in hospital. I was already running late, fighting with the recommendation-centric home screen and slow mobile data latency just to find the podcasts I wanted to listen to. The app felt user-hostile when all I needed was the content it knew I liked cached and ready. The interface that once helped me now actively hindered me when I needed it most.

I discovered AntennaPod - a free, minimalist podcast app that immediately met my needs. The contrast was stark: AntennaPod is ~9 MB, focused on doing one thing well, while I was paying for a bloated service fighting against my preferences. Decision made: subscription cancelled.

Rediscovering my black 2005 iPod Classic reinforced my decision. It still held reggae mixtapes and Lee "Scratch" Perry dub tracks totally absent from Spotify's catalogue. The platform's millions of tracks meant nothing if it lacked the specific music I wanted.

Beyond missing music, I wanted tight control over my podcast feeds. In an age of disinformation, podcast feeds are how I dodge misinformation - no algorithmic inserts. With AntennaPod, I choose my sources directly and can listen to podcasts I am actually interested in like Ones and Tooze or The Rachman Review on my terms.

Life After Spotify

The shift has been revealing. When I play Talking Heads' Stop Making Sense on CD, it stops at the end - no autoplay dragging me into something the algorithm thinks is similar. Each album feels like a complete artifact rather than what Huxley called soma - that constant drip of content designed to keep you passive.

Finding that compilation of all UB40's 1980s singles in a record store brought genuine discovery back into my life. My friend's DJ set from Sheffield uni in 2003, preserved as an MP3, represents the kind of personal musical history no streaming service can replicate.

I'd been streaming Spotify through my PS5, where every track gets up-scaled into Dolby Atmos. On paper that sounds like an upgrade, but the result felt thin—extra reverb, smeared bass, no real punch. The moment I switched to Stereo Direct and played the same album from my plain old CD player, the mix snapped back to life: crisp cymbals, tight low-end, space between instruments. Drop the needle on a vinyl copy and the improvement is even more obvious—warmth, depth, and a sense of being in the room with the music. That A/B test told me convenience was costing me something real.

I've moved to foobar2000 on my phone syncing with my NAS library. The £16.99 I used to hand over to Spotify now travels a different route. A couple of weeks ago, over a pint with my old colleague James Green, I was grumbling about subscription bloat when he said, "Why not give that money to artists directly on Bandcamp Friday?" The idea stuck. Every first Friday of the month I pick up two new releases—records I actually own—paid for by the cash that once vanished into a Premium Duo fee.

What I've Gained

The pros are clear: better sound through my hi-fi, actual ownership of music, and intentional discovery through Malvern's second-hand shops¹. Yes, I hear adverts when accessing some content now, and maintaining my own library requires effort. But these are acceptable trade-offs for regaining control.

This isn't a call for everyone to abandon Spotify - it's simply a reminder to evaluate whether services still earn their fees. If you're experiencing algorithm fatigue, simpler tools or physical formats might rekindle your connection with music.

Stepping off the mainstream platform reminded me that music listening doesn't have to be passive consumption. Whether it's curated playlists or algorithmic discovery, physical media or streaming, what matters is that your approach actually serves you.

Shout out to Malvern's brilliant media and record stores

Carnival Records - 83 Church Street, Great Malvern, WR14 2AE. This independent record shop has been serving Malvern since 2009 and was recently named one of the 'greatest record stores in the world' by the Financial Times.

St Richard's Hospice Book & Media Store - 116 Worcester Road, Malvern Link, WR14 1SS. Stocks 8,000-10,000 items including books, CDs, DVDs, and vinyl. Open Monday-Saturday 9:00-17:00, Tel: 01684 573480. ↩

It's a Great Time to be a Pen Tester

Sat 07 June 2025

I've been involved with penetration testing for years, and I've watched it remain broadly static, alongside the broader transformation of software development practices. While pen testing remains a valuable security practice, its role and effectiveness have been significantly challenged by the shift toward continuous delivery, cloud-native architectures, and agile development methodologies. But with AI capabilities likely to improve aspects of productivity and red teaming approaches gaining momentum, we may be approaching a fundamental shift in how security testing operates.

The Traditional Model and Its Strengths

Traditional penetration testing emerged from and continues to serve important organizational needs. The model provides clear separation of duties through third-party assessment, establishing a genuine third line of defense. External pen testers bring established frameworks like CVSS for vulnerability classification, standardized tooling, and comprehensive written reports with quantifiable metrics.

This approach serves multiple stakeholders effectively. For auditors and compliance teams, it provides the documentation and independent verification required by regulatory frameworks. For executive leadership and non-technical stakeholders, it offers confidence through clear metrics and professionally presented findings. The separation between development teams and security assessors ensures that assumptions and blind spots within internal teams can be identified by fresh perspectives.

However, this very strength reveals a fundamental tension. The emphasis on standardized reporting, compliance alignment, and stakeholder communication often constrains the time and focus available for deep technical investigation. The most significant vulnerabilities frequently emerge not from checklist-driven testing but from extended exploration of edge cases, creative attack paths, and nuanced understanding of how systems behave under unusual conditions.

The Continuous Delivery Challenge

Modern software development operates through thin slices of functionality, small incremental changes, and rapid deployment cycles. Code moves from development to production daily or even hourly, with infrastructure defined and modified as code alongside application changes. The rise of continuous delivery, cloud infrastructure, and agile methodologies has created a fundamental mismatch with traditional pen testing cadences.

This stands in sharp contrast to the traditional stage-gate process of requirements specification, software development, deployment, penetration testing, sign-off, and release. The old model assumed relatively stable systems that could be assessed at defined points in time, with findings that would remain relevant throughout subsequent operational periods.

Organizations have generally responded by applying pen testing selectively - before initial production releases, for major feature launches, or when significant risk factors like sensitive data handling are introduced. While this maintains some security oversight, it creates an impedance mismatch between development velocity and security validation cycles.

The Rise of Compensating Controls

By 2025, rather than simply stretching traditional pen testing to cover modern development practices, the security industry has developed several complementary approaches that better align with continuous delivery workflows.

Developer-focused vulnerability scanning tools like Snyk provide immediate, actionable security feedback within CI/CD pipelines. Infrastructure security platforms like Wiz offer continuous assessment of cloud configurations and runtime environments. The shift-left security movement has brought threat modeling directly into development teams, while deeper integration of advanced detection has developed runtime monitoring, anomaly detection, and threat hunting to compare developer activities against established baselines.

The AI Turning Point of 2025?

We may be at a turning point where automated penetration testing tools finally take the leap in capability we've been eagerly waiting for since at least 2018. The advances in AI reasoning, contextual understanding, and tool use we've seen recently suggest that automated pen testing could soon handle much of the systematic exploration that currently requires human analysts.

AI agents that can understand complex architectures, analyze code repositories, and reason about attack paths might finally deliver on the promise of scalable, intelligent security assessment. Large language models with sophisticated reasoning capabilities could potentially automate the methodical aspects of penetration testing while maintaining context across complex investigation flows.

The Human Spark That Automated Tools Miss

But here's what I've observed about the best pen testers I know personally: they share something remarkable with the most skilled developers I've worked with. When a talented developer implements an elegant solution to a complex problem, there's a visible spark of delight - that moment when everything clicks into place beautifully. The most effective pen testers exhibit that same quality, but in reverse: they experience genuine glee when they pull the lid off a system and discover something unexpected, when following an intuitive hunch leads to unraveling a significant vulnerability.

This isn't just professional satisfaction - it's the manifestation of deep creative and analytical thinking. These moments of discovery happen when experienced pen testers sense that something feels wrong, even when all automated tools are quiet. They follow threads that logic suggests might be dead ends. They question assumptions about how systems should behave and explore the gaps between intended and actual behavior.

The Future of Security Discovery

Where do we go from here? What I anticipate is AI-assisted red teaming that could fundamentally change how we validate security controls. Red teaming differs from traditional penetration testing by simulating realistic adversary behavior to test an organization's entire defensive capability - not just finding vulnerabilities, but evaluating how well people, processes, and technology respond to actual attack scenarios.

This longer lifecycle, higher context approach solves the impedance mismatch introduced by high velocity software delivery. For auditors and compliance, this should shift focus from vulnerability counts to defensive effectiveness metrics: detection speed, response quality, and organizational resilience under realistic attack conditions.

Automated, increasingly AI supported, relatively straightforward checks can run at the point of change - integrated directly into CI/CD pipelines to catch obvious vulnerabilities and smells with immediate feedback that matches development velocity. Meanwhile, high-context, high-insight human-driven red team exercises can operate independently of the development cycle. This decoupling allows each approach to operate at its natural cadence and leverage its strengths. The human element remains essential for campaign strategy, creative attack development, and business context interpretation, while AI could handle systematic execution and adaptation.

I don't know for certain what the next steps in security testing will look like, but these extrapolations seem worthy of consideration. Combined with growing threats fueled in part by adversarial adoption of AI, it seems like a great time to be a pen tester!

The Coming Wave of AI Agency: A Security Perspective

Sun 23 March 2025

We're not 6 months from AGI. We're 6 months from an inflection point where LLMs are being given real-world agency – with all the security and governance implications that entails. Different ways of describing the same phenomena, one grounded in Silicon Valley marketing, the other in operational reality.

The organizational controls and governance frameworks created in response will significantly shape the trajectory of AI development.

Beyond Theoretical Debates

While theoretical debates about AGI timelines continue, a more immediate security concern is emerging: organizations are rapidly deploying LLMs with actual agency in production environments. The focus on capabilities and benchmarks misses the governance reality—systems are being granted permissions to act in ways that create novel attack surfaces and risk vectors.

The Implementation Gap

This rollout reveals concerning patterns across sectors:

Reduced human oversight in critical decision pathways
Deployment in environments with complex threat models
Access provisioning to sensitive infrastructure and data
AI systems integrated with financial transaction or health capabilities

The consequences won't be confined to research labs—they'll manifest in security incidents, compliance challenges, and operational disruptions that demand immediate responses.

Establishing Governance Frameworks

The institutional responses to these inevitable security incidents will establish lasting governance patterns. In short risk management frameworks will need to recalibrated for autonomous systems

These governance structures—developed under operational pressure—will likely define the security boundaries of AI development more definitively than any capability roadmap.

The coming months aren't about theoretical intelligence thresholds—they're about what happens when AI systems act with increasing autonomy in consequential domains without mature security models. The organizations that approach this transition with rigorous risk assessment methodologies will be best positioned to both innovate and maintain operational integrity.

The real question isn't when we reach artificial general intelligence. It's how effectively we can adapt our security governance to manage the operational, compliance, and risk implications of increasingly autonomous systems acting in our digital infrastructure.

Something changed in the Tech Industry

Mon 09 September 2024

It's all just plugging Legos now

The tech industry has experienced a significant shift towards cloud computing and the integration of prepackaged services. This change has fundamentally altered the landscape of software development, moving away from custom builds towards a model of "gluing together" existing components. The era of building everything from scratch is giving way to a more efficient, modular approach that leverages pre-existing cloud infrastructure and services.

The changing role of developers

This shift has profoundly impacted the role of developers and engineers in tech companies. Increasingly, tech professionals find themselves following pre-set playbooks rather than crafting entirely novel solutions. The scope for innovative new programming languages and approaches has diminished as work is now often conducted at a higher level of abstraction. This change represents a significant departure from the traditional software development paradigm, emphasizing integration and configuration over ground-up creation.

Evolution of skills and knowledge

The skills and knowledge required for tech professionals have evolved in tandem with these industry changes. Cloud certifications have become increasingly important, reflecting the need for expertise in specific cloud platforms and services. This shift highlights a move towards more specialized, platform-specific knowledge rather than broader, language-centric programming skills. As a result, tech professionals must continually adapt and update their skillsets to remain relevant in this cloud-dominated landscape.

The new competitive landscape

The competitive landscape in the tech industry has been reshaped by this shift. While many companies have benefited from the move to cloud and prepackaged services, it has also led to a certain commoditization of tech services. Success now often favors those who can execute in the most consistent and efficient manner, rather than necessarily those with the most innovative ideas or the best raw talent. This change has leveled the playing field in some ways, but it also poses challenges for companies trying to differentiate themselves in an increasingly homogenized tech ecosystem.

Looking ahead

Looking ahead, while cloud adoption has reached a saturation point, its influence will remain strong. The introduction of generative AI is poised to further enhance the ability to integrate and "bolt together" various components, akin to the impact Visual Basic for Applications had in its time. While this trend may cover a significant portion of the industry's needs, there will still be opportunities for novel, custom software development, albeit potentially less frequently. The challenge for the industry will be balancing the efficiency and speed offered by these pre-packaged solutions with the need for innovation and customization that drives technological progress.

Resident Evil in Virtual Reality is actually terrifying

Sun 05 March 2017

First contact with VR

From the moment I was stood holding a lightsaber in the Star Wars VR demo I knew I was hooked. There was a flood of what I'm going to call 'emotions' to my stomach. I was present in the here and now.

It was a slow start- I've spent the last few years being completely down on VR. Like many others in the software industry you get desensitised to hype because it is constant in the technology industry and rarely means anything. Added to that, I'm old enough to remember Virtual Reality being rubbish and failing to take off the first time around, back in the 90s.

I remember seeing this setup on both Blue Peter and Tomorrows World in the 90s. I never tried it and it never took off.

It was essentially curiosity and FOMO which led to trying out the HTC Vive setup in the ThoughtWorks Manchester office. I expressed mild interest and a colleague offered to give me a demo. And at first the graphic resolution and quality seemed grainier and showed much less definition that I had expected. The fact that I was stood in the office essentially wearing a hood, unaware of my actual surroundings made me feel awkward. The sound was pretty disappointing and the overall impression was clunky.

Unexpected impact

Forty minutes later, I remember leaving the City Tower in Manchester with my mind racing about the possibilities of this technology. I called my wife and told her about the moment the light sabre popped out of the droids head and the storm troopers started running towards me. VR is clunky and expensive and awkward but it allows game designers to create experiences which connect like nothing I've experienced before in gaming. Its not a rational thing at all.

Getting a PSVR

So after my initial VR experience with the Vive I decided to buy the PSVR headset to extend my Playstation 4. I'm in my mid-30s so nobody knows what to get me for Christmas anyway — and I'm very lucky in that I have a lovely family who want to get me gifts even though I'm an adult. So I asked everyone for Amazon vouchers and then threw in a few quid of my own.

In fairness Rez Infinite is a great game

Sadly, Sony were out of stock of PSVRs over Christmas and some vendors were asking £50 over the asking price. As I'm not a 15 year old boy anymore, I wasn't going to pay over the asking price. My PSVR arrived last week almost 2 months after Christmas. Since then I've played some great, wonderfully immersive games such as Rez Infinite and Tethered.

(Actually I want to give a shout to Secret Sorcery, the developers who created Tethered. I was so impressed by the amalgam of really great looking VR, 'small scale graphics', early 2000s Peter Molyneux style weirdness and an interaction scheme that really works. And its a decent strategy game! Looking forward to seeing what comes next from this developer!)

Enter Resident Evil 7

With all that said, I want to make a few comments on Resident Evil 7.

Arggh!

It has made me feel a kind of fear I've simply never felt before.

I've never had any fear response to films. The Blair Witch Project was a complete waste of time as far as I was concerned. Amusing to hear some of the reactions in the cinema and at least I could say I'd seen it — but I was not scared at all. Bored perhaps. Equally things like American Werewolf in London, Carrie, The Exorcist, The Descent or The Thing. I found some of these films quite watchable in themselves, but scary was the aesthetic, not something I actually felt. And I'm quite aware that some people do physically recoil, as I've sat with my screaming wife holding onto me through several of these films.

Equally computer games — I can honestly say that I don't recall ever being shocked or actually scared by a computer game before. I remember there was a game in about 1994 called Creature Shock which surprised me when this tentacle thing attacked. So as a 13 year old I was momentarily overwhelmed by the combination of very advanced graphics for the time and interactivity. It didn't last for long though, as Creature Shock was not really very interactive at all after you'd played it for long.

True terror

Resident Evil made me scream yesterday. From the moment I arrived at the spooky house the game builds a slow sense of creepy tension which then unleashes moments of actual terror and shock (Spoiler: you're not alone in the house). All of a sudden I can actually relate to my wife's reactions when we watched The Descent.

After getting brutally murdered at one point I had to take the headset off and have a sit down. Was I just feeling VR motion sickness, or was it mild shock? Short of getting a taxi to the roughest pub in the city and upsetting some gang dudes there is no way I can replicate this feeling.

A breakthrough for immersive experiences

Psychologically and technologically I can relate to a lot of the reasons why this might work and why it might be the case. I guess a lot of folks who read sites like Medium can too. However I just wanted to share the sheer visceral human reaction this technology creates. Many reviews gloss over or make light of it — but its a big deal.

I can't wait to see where this technology develops next.