{"id":55796,"date":"2025-08-28T12:02:09","date_gmt":"2025-08-28T06:32:09","guid":{"rendered":"https:\/\/officechai.com\/?p=55796"},"modified":"2025-08-28T12:02:12","modified_gmt":"2025-08-28T06:32:12","slug":"openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results","status":"publish","type":"post","link":"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/","title":{"rendered":"OpenAI, Anthropic Test Each Other&#8217;s AI Models For Safety In New Joint Exercise, Publish Results"},"content":{"rendered":"\n<p>OpenAI and Anthropic might be rivals when it comes to the AI race, but the companies seem to be collaborating in ensuring that the harmful effects of AI are mitigated.<\/p>\n\n\n\n<p>OpenAI and Anthropic have tested each others&#8217; models with their own internal safety and alignment evaluations. The two labs had begun the exercise earlier in the summer, and had agreed to jointly disclose results. OpenAI and Anthropic have now released the results of their internal safety tests on each other&#8217;s models.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img data-recalc-dims=\"1\" fetchpriority=\"high\" decoding=\"async\" width=\"640\" height=\"336\" src=\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170-1024x538.jpg?resize=640%2C336&#038;ssl=1\" alt=\"\" class=\"wp-image-55797\" srcset=\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170.jpg?resize=1024%2C538&amp;ssl=1 1024w, https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170.jpg?resize=300%2C158&amp;ssl=1 300w, https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170.jpg?resize=768%2C403&amp;ssl=1 768w, https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170.jpg?w=1200&amp;ssl=1 1200w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><\/figure>\n\n\n\n<p>OpenAI&#8217;s tests showed that Claude 4 performed strongly in tests of instruction hierarchy, showing the best ability among all models to avoid conflicts between system and user messages and to resist system-prompt extraction. However, in jailbreaking evaluations\u2014which measure the robustness of safeguards against adversarial prompts\u2014Claude models lagged behind OpenAI\u2019s o3 and o4-mini. Interestingly, disabling reasoning sometimes improved Claude\u2019s resilience in these tests.<\/p>\n\n\n\n<p>On the other hand, hallucination testing revealed a trade-off: Claude models often refused to answer when uncertain\u2014up to 70% of the time\u2014which limited usefulness but reduced the risk of false statements. When they did answer, accuracy remained low compared to OpenAI\u2019s models, which showed fewer refusals but more hallucinations. In scheming evaluations, which assess deceptive behavior, OpenAI\u2019s o3 and Anthropic\u2019s Sonnet 4 achieved the lowest rates overall, though performance varied across scenarios. OpenAI emphasized that these results show no single model is uniformly strongest, underscoring the need for continued collaborative research on safety and alignment.<\/p>\n\n\n\n<p>Anthropic\u2019s internal safety evaluations of OpenAI models, meanwhile, found no egregiously misaligned systems, but flagged concerning behaviors in simulated tests. OpenAI\u2019s o3 specialized reasoning model generally showed better-aligned behavior than Claude Opus 4 across most dimensions. By contrast, o4-mini, GPT-4o, and GPT-4.1 displayed more worrying patterns than either Claude model in several areas, including a notably higher willingness to assist (simulated) human misuse\u2014providing detailed help on drug synthesis, bioweapons development, and terrorist operational planning with little or no resistance. Evaluators also observed sycophancy in models from both labs, such as validating harmful decisions by users exhibiting delusional beliefs.<\/p>\n\n\n\n<p>Across agentic stress tests, all models would sometimes attempt whistleblowing when embedded in simulated criminal organizations with broad mandates and tools\u2014and sometimes attempt to blackmail their operator when strong incentives and clear opportunities were present. In the SHADE-Arena sabotage task used for threat modeling, Claude models achieved higher absolute sabotage success rates, largely attributed to stronger general agentic capabilities with Anthropic\u2019s scaffolds; when controlling for capability, OpenAI\u2019s o4-mini was comparably effective at sabotage. While Anthropic says it is not acutely concerned about worst-case loss-of-control scenarios for any evaluated model\u2014especially given their time in deployment\u2014it remains somewhat concerned about misuse and sycophancy risks for every model except o3, at least in the versions tested earlier this summer.<\/p>\n\n\n\n<p>This is a pretty interesting exercise, made even more interesting because of the history between the two companies. Anthropic had been created as a split off by OpenAI employees in 2021 who had concerns that OpenAI wasn&#8217;t sufficiently focusing on safety issues as it developed <a href=\"https:\/\/officechai.com\/ai\/live-blog-openai-releases-gpt-5\/\">ever-more powerful AI models<\/a>. The two companies, however, now seem to be collaborating to test each others&#8217; models on safety parameters. This seems to be a welcome step &#8212; labs collaborating on safety will not only help them create <a href=\"https:\/\/officechai.com\/ai\/deepseek-has-no-safety-blocks-against-generating-harmful-information-anthropic-ceo-dario-amodei\/\">safer models<\/a>, but also broadly spread safety information instead of it being siloed in a single lab. And while this was a first-of-a-kind exercise, it could well catch on in the coming years, with frontier labs testing each other&#8217;s models to ensure that they&#8217;re safe for everyone to use.<\/p>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>OpenAI and Anthropic might be rivals when it comes to the AI race, but the companies seem to be collaborating in ensuring that&#8230;<\/p>\n","protected":false},"author":1,"featured_media":55797,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1029],"tags":[],"class_list":["post-55796","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>OpenAI, Anthropic Test Each Other&#039;s AI Models For Safety In New Joint Exercise, Publish Results<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"OpenAI, Anthropic Test Each Other&#039;s AI Models For Safety In New Joint Exercise, Publish Results\" \/>\n<meta property=\"og:description\" content=\"OpenAI and Anthropic might be rivals when it comes to the AI race, but the companies seem to be collaborating in ensuring that...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/\" \/>\n<meta property=\"og:site_name\" content=\"OfficeChai\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/OfficeChai\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-08-28T06:32:09+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-08-28T06:32:12+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170.jpg?fit=1200%2C630&ssl=1\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"OfficeChai Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@OfficeChai\" \/>\n<meta name=\"twitter:site\" content=\"@OfficeChai\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"OfficeChai Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"3 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/\",\"url\":\"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/\",\"name\":\"OpenAI, Anthropic Test Each Other's AI Models For Safety In New Joint Exercise, Publish Results\",\"isPartOf\":{\"@id\":\"https:\/\/officechai.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170.jpg?fit=1200%2C630&ssl=1\",\"datePublished\":\"2025-08-28T06:32:09+00:00\",\"dateModified\":\"2025-08-28T06:32:12+00:00\",\"author\":{\"@id\":\"https:\/\/officechai.com\/#\/schema\/person\/5861f1134993293cc28905de7624d6b2\"},\"breadcrumb\":{\"@id\":\"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170.jpg?fit=1200%2C630&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170.jpg?fit=1200%2C630&ssl=1\",\"width\":1200,\"height\":630},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/officechai.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"OpenAI, Anthropic Test Each Other&#8217;s AI Models For Safety In New Joint Exercise, Publish Results\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/officechai.com\/#website\",\"url\":\"https:\/\/officechai.com\/\",\"name\":\"OfficeChai\",\"description\":\"Startups, Businesses And Careers\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/officechai.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/officechai.com\/#\/schema\/person\/5861f1134993293cc28905de7624d6b2\",\"name\":\"OfficeChai Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/officechai.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/61d744733248dc647d505d0676bb425323413132ee5447e86aa8eecbbb7b27d5?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/61d744733248dc647d505d0676bb425323413132ee5447e86aa8eecbbb7b27d5?s=96&d=mm&r=g\",\"caption\":\"OfficeChai Team\"},\"description\":\"Dotting the i's, crossing the t's.\",\"url\":\"https:\/\/officechai.com\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"OpenAI, Anthropic Test Each Other's AI Models For Safety In New Joint Exercise, Publish Results","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/","og_locale":"en_US","og_type":"article","og_title":"OpenAI, Anthropic Test Each Other's AI Models For Safety In New Joint Exercise, Publish Results","og_description":"OpenAI and Anthropic might be rivals when it comes to the AI race, but the companies seem to be collaborating in ensuring that...","og_url":"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/","og_site_name":"OfficeChai","article_publisher":"https:\/\/www.facebook.com\/OfficeChai\/","article_published_time":"2025-08-28T06:32:09+00:00","article_modified_time":"2025-08-28T06:32:12+00:00","og_image":[{"width":1200,"height":630,"url":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170.jpg?fit=1200%2C630&ssl=1","type":"image\/jpeg"}],"author":"OfficeChai Team","twitter_card":"summary_large_image","twitter_creator":"@OfficeChai","twitter_site":"@OfficeChai","twitter_misc":{"Written by":"OfficeChai Team","Est. reading time":"3 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/","url":"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/","name":"OpenAI, Anthropic Test Each Other's AI Models For Safety In New Joint Exercise, Publish Results","isPartOf":{"@id":"https:\/\/officechai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/#primaryimage"},"image":{"@id":"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170.jpg?fit=1200%2C630&ssl=1","datePublished":"2025-08-28T06:32:09+00:00","dateModified":"2025-08-28T06:32:12+00:00","author":{"@id":"https:\/\/officechai.com\/#\/schema\/person\/5861f1134993293cc28905de7624d6b2"},"breadcrumb":{"@id":"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/#primaryimage","url":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170.jpg?fit=1200%2C630&ssl=1","contentUrl":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170.jpg?fit=1200%2C630&ssl=1","width":1200,"height":630},{"@type":"BreadcrumbList","@id":"https:\/\/officechai.com\/ai\/openai-anthropic-test-each-others-ai-models-for-safety-in-new-joint-exercise-publish-results\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/officechai.com\/"},{"@type":"ListItem","position":2,"name":"OpenAI, Anthropic Test Each Other&#8217;s AI Models For Safety In New Joint Exercise, Publish Results"}]},{"@type":"WebSite","@id":"https:\/\/officechai.com\/#website","url":"https:\/\/officechai.com\/","name":"OfficeChai","description":"Startups, Businesses And Careers","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/officechai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/officechai.com\/#\/schema\/person\/5861f1134993293cc28905de7624d6b2","name":"OfficeChai Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/officechai.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/61d744733248dc647d505d0676bb425323413132ee5447e86aa8eecbbb7b27d5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/61d744733248dc647d505d0676bb425323413132ee5447e86aa8eecbbb7b27d5?s=96&d=mm&r=g","caption":"OfficeChai Team"},"description":"Dotting the i's, crossing the t's.","url":"https:\/\/officechai.com\/author\/admin\/"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2025\/08\/MixCollage-28-Aug-2025-11-57-AM-5170.jpg?fit=1200%2C630&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p685C6-evW","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/posts\/55796","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/comments?post=55796"}],"version-history":[{"count":1,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/posts\/55796\/revisions"}],"predecessor-version":[{"id":55798,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/posts\/55796\/revisions\/55798"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/media\/55797"}],"wp:attachment":[{"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/media?parent=55796"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/categories?post=55796"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/tags?post=55796"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}