When Research Becomes a Marketing Tool

Sep 12

In the current era of AI, circa September 2025, we are seeing the peak of hype – perhaps – around large language models. So much so, as a matter of fact, that I feel the need to caveat in my own writing that I use the m dash all on my own, I have always written that way. I say this since people have discovered that Large Language Models (LLMs) really like to use the m dash, and many suspect something is AI generated when m dashes are present. On this subject, let’s dive into our main topic which is not too dissimilar from our m dash digression.

Open AI has recently released a paper called “Why Language Models Hallucinate” on September 4th 2025. It’s a decent piece of research work; it defends an assertion as to why language models hallucinate. The paper essentially states that during the training and evaluation process, language models are not sufficiently (or at all) penalized for guessing. Since many of these models are trained against tests as benchmarks, the models are actually rewarded for guessing confidently, as this can increase their benchmark scores, which in turn means their reward function is getting maximized. Understandably, on an infinite set of random user questions from the likes of you and I, this confident guesswork continues on by design. We have named its confident erroneous answers “hallucination” – a bit of a misnomer when looking at the mechanics of why it happens – but nonetheless the experience of interacting with the model certainly feels like talking with something that is hallucinating.

Speaking as someone in the AI research space, the authors’ claims and defenses were obvious; this was not novel work, nor a novel assertion. I can, without a PhD, confidently state that LLMs hallucinate because of their architecture; it is core to how they work, they are generative models, based in the world of statistics and probability, and to paraphrase from someone who does have a PhD, “LLMs are often right because we are often right”. If you look at how the models are trained, even basic intuition should point you to the conclusion that models will be correct about information that we are often correct about on the internet at scale. If information on a topic is controversial, under documented, or just absent from the models training data, it has no idea what’s going on – nor the ability to understand it’s own limitations (self awareness).

Now, I’m not here to criticize the paper on its merits, or its authors. What this post is really going to discuss is the intentional manufacturing and titling of research in order to please investors, gain market share, or just advance branding goals. After reading through this paper and many LinkedIn reactions to its release, I am pretty comfortable claiming OpenAI is doing just that here: presenting commissioned industrial research as a novel science while it is really just a targeted marketing campaign. As always, disclaimer that my views are my own and do not reflect those of my employer or affiliates.

To really frame and ground this discussion, we need to look at the AI landscape and the peaking LLM-Hype-Cycle we are in the midst of (or maybe finally at the end of?) right now. In November of 2022, OpenAI’s “ChatGPT” took the world by storm. It became the fastest application to ever hit a million active recurring users, a triumph. The scale and rate of adoption were unprecedented, and the growth and improvement in performance was rapid and staggering. The leaps from GPT 3.5, to GPT 4, to GPT 4o were impressive and many, including myself, were nervously awaiting the performance curve to either continue that exponential or finally level off. Thankfully, model intelligence progress slowed down and we finally see dramatically diminishing returns from compute at scale.

For anyone living under a rock I will recap the state of the AI space in the market (because this is very important context). Competition with OpenAI was fierce and immediate, both externally and internally. Google’s LLM “Bard” (now rebranded to Gemini) was an early contender. A few quite notable AI researchers and engineers from OpenAI who felt that OpenAI was not living up to its stated mission (of “open” AI, ironically) broke off and formed a company called Anthropic, who provided “Claude” as their premier language model. Facebook (or “Meta” as they like to be called now) entered the arena with the “Llama” models (a personal favorite of mine) which are, at the time of writing this, basically still the only premier open-source language models provided by a U.S. tech company. Why did I caveat this with “U.S.”, well turns out the U.S. has competition too.

China has been a big player here too. The Chinese company Alibaba has put forward the Qwen family of open source LLMs (which are also good, and hugely popular), and DeepSeek’s R1 which redefined the paradigm for efficient training at scale, getting comparable or superior benchmarks to OpenAI’s premier “reasoning model” O1, with almost 50x more efficient compute (lower cost, way less training needed). Again, to summarize here, the space has become fiercely competitive, domestically and internationally, internally and externally. Everywhere they look, OpenAI faces a crowded market that they are not the clear leader in. Writing this in September of 2025, their early mover advantage is all but gone – save their incredibly favorable market position as a premier provider for Microsoft and Apple’s AI backend needs.

In a landscape where users, business, governments, and just about anyone who has need and money for LLMs is aware of hallucination and hesitant to trust AI, it’s clear that a winning goal is to make sure that your models are safe, trustworthy, and secure. OpenAI is well aware of this, and while some firms like Meta Research have opted to pour billions into trying to develop premier “Superintelligence”, OpenAI continues to pay the usual ludicrously lucrative tech salaries to new teams of AI Safety Researchers, going as far as to create dedicated fellowships and hiring premier national security staff, scientists, and researchers. All of this bodes well, and I’m sure of the three OpenAI authors on this paper, “Why Language Models Hallucinate” work as AI Safety Researchers at OpenAI, but OpenAI has a big problem when it comes to AI safety, trustworthiness, and security: they are a clear number 2. The market wants to bet big on number one, and in a space this competitive, it’s not good to be a clear number two. So who’s the clear number one, you may be asking, and I’ll tell you: it’s Anthropic. The bay area tech firm, who as mentioned earlier was started by a team that broke away from OpenAI, are at least brand-wise, the clear go-to for AI safety, trustworthiness, and security. OpenAI may have been first movers on ChatGPT, but Anthropic were first movers on AI security, branding themselves as committed to “doing things the right way” basically. That’s the essence of their brand, and on top of that, working for them is their credibility: they worked on ChatGPT at OpenAI, took a look around and didn’t like what they saw, and went to do their own thing. Anthropic has hired many, many teams of AI red-teamers, national security experts, cyber security experts, and list goes on. They regularly publish research they do on their own models, going as far to also publish the benchmarks, methods, and results in an abnormally transparent way for a company that keeps its models strictly closed source. So, by now you’ve probably got the clear picture: OpenAI is a clear number two in a space that they have tremendous incentive to be number one in – or at least contend for the title.

This brings us to one of the most disingenuously titled and branded pieces of research I have personally ever seen in my 5 years of reading AI research. Again, let’s be clear, I’m by no means going after the quality or integrity of this team of four researchers who I have no doubt approached this important problem in good faith, and furthermore actually did good work. The problem, to restate it, is that this work is not a novel assertion or understanding of why LLMs hallucinate, but that is how OpenAI has presented it; like they have really “finally figured it out”. Let’s be crystal clear here, and not mince words, they are well aware of what they are doing. This is a company valued recently at 180 billion dollars, with some of the smartest minds in the world working there, and some of the best marketing and branding talent money can buy. The titling and presentation of this research work is intentionally, and from my perspective, a clear and heavy-handed attempt to gain credibility and market share in the AI safety/trustworthiness space which will be increasingly lucrative and important for them to earn government contracts.

Let’s add some more context here, OpenAI is facing a lawsuit from the parents of a teenager who recently took his own life with some encouragement or at least apathy from ChatGPT, which he confided in repeatedly over an extended period of time leading up to his death. That story is tragic, and a discussion in its own right, but for now I will avoid the temptation to dive deeper into that point to the clearly refocus to the hit that OpenAI has taken to its reputation in the trustworthy and safe AI space. I mean, a teenager taking their own life, with the help of a seemingly oblivious or naïve model? As far as PR goes, that’s almost as bad as it gets.

So we’ve established incentive, we established the problem, now what? Time to go a step further, what is the real consequence of misrepresenting research to the public? This is a multi-dimensional issue, and carries more weight than it may seem. To start things out, lets look at how this looks to the research community. I have a personal incentive to appear, and actually be, an objective assessor of AI models in my official capacities. Despite this, I have gone out of my way to write extensive criticism of a premier AI company’s actions. This is targeted, clear, dissatisfaction with a specific action (misrepresenting research) and I do this at great risk to my own brand as an unbiased assessor of models. No doubt that, despite my best efforts, this will leave behind cognitive biases against OpenAI’s work unless there are public statements issued, or clear changes in behavior from the firm. I will offer this: I think there are many more researchers who share my opinions here, and a brief search through LinkedIn will confirm as much, but it cannot be overstated how many more may share this same opinion quietly, and I don’t blame them; I don’t suspect I’ll be rewarded for the public criticism.

Outside of the research community, I think this sort of action will hurt, not help, OpenAI in the long run. The short-sighted attempt at positioning themselves as a market leader will ultimately not be rewarded as they take a notable credibility hit in the research community.

On a broader and more philosophical level, I think they’re playing with fire. This is an era with increased anti-intellectualism, and a growing distrust of science, which I believe can best be combated by the practice of good science and advocacy for the scientific process. When premier firms who present themselves as leaders in this technology (and, actually are) are disingenuous or misrepresentative, it doesn’t just hurt them, it hurts everyone who practices good science, everyone who does AI research, and ultimately, that is their own team of four authors who did this work in the first place. Over the long run, it will be detrimental to OpenAI’s own people and devalue their credentials, hard work, and rigorous scientific methodology. That is plainly unfair to them, and it’s up to people who aren’t on OpenAI’s payroll to call them out on it.

I’d like to leave everyone with this: if we don’t call out bad science when we see it (which includes misrepresentation) quickly, clearly, and publicly, even for the sake of politeness, we will all suffer for it over time. Let’s not succumb to this tragedy of the commons or prisoner’s dilemma—however you want to frame the problem—let’s collectively agree to do better and hold everyone in our community accountable, for our own sakes and for everyone else’s.

Jake Hargrove

When Research Becomes a Marketing Tool

AI in Music: The Google vs. UMG Deal