Throughout its huge GPT-5 livestream on Thursday, OpenAI confirmed off a couple of charts that made the mannequin appear fairly spectacular — however in the event you look carefully, some graphs have been a bit of bit off.
In a single, paradoxically exhibiting how effectively GPT-5 does in “deception evals throughout fashions,” the dimensions is all over. For “coding deception,” for instance, GPT-5 apparently will get a 50.0 p.c deception price, however that’s in comparison with OpenAI’s smaller 47.4 p.c o3 rating which by some means has a bigger bar.
Or this one, the place considered one of GPT-5’s scores is decrease than o3’s however is proven with an even bigger bar. On this similar chart, o3 and GPT-4o’s scores are completely different however proven with equally-sized bars. That chart was unhealthy sufficient that CEO Sam Altman commented on it, calling it a “mega chart screwup.” An OpenAI advertising and marketing staffer additionally apologized for the “unintentional chart crime.”
OpenAI didn’t instantly reply to a request for remark. And whereas it’s unclear if OpenAI used GPT-5 to truly make the charts, it’s nonetheless not an incredible search for the corporate on its huge launch day — particularly when it’s touting the “important advances in lowering hallucinations” with its new mannequin.
