{"id":60266,"date":"2026-04-08T00:30:10","date_gmt":"2026-04-07T19:00:10","guid":{"rendered":"https:\/\/officechai.com\/?p=60266"},"modified":"2026-04-08T00:30:23","modified_gmt":"2026-04-07T19:00:23","slug":"claude-mythos-preview-benchmarks-swe-bench-pro","status":"publish","type":"post","link":"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/","title":{"rendered":"Anthropic&#8217;s Claude Mythos Preview Smashes Coding Benchmarks, Scores 77.8 On SWE-Bench Pro"},"content":{"rendered":"\n<p>Anthropic is maintaining its lead in coding models, and how.<\/p>\n\n\n\n<p>Claude Mythos Preview \u2014 the unreleased frontier model at the center of Anthropic&#8217;s <a href=\"https:\/\/officechai.com\/ai\/anthropic-says-ai-agents-found-4-6-million-of-exploits-in-simulated-blockchain-smart-contracts\/\">Project Glasswing<\/a> cybersecurity initiative \u2014 posts benchmark numbers that make the current generation of public models look like an earlier era. Across agentic coding, scientific reasoning, and computer use, Mythos Preview doesn&#8217;t just beat Opus 4.6; it laps it on several key tests.<\/p>\n\n\n\n<p><\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Coding: The Numbers That Matter<\/h2>\n\n\n\n<p>On <strong>SWE-bench Pro<\/strong> \u2014 the hardest tier of the industry&#8217;s most-watched software engineering benchmark \u2014 Mythos Preview scores <strong>77.8%<\/strong> against Opus 4.6&#8217;s <strong>53.4%<\/strong>. That&#8217;s a 24-point gap on a test designed to be difficult. For context, when <a href=\"https:\/\/officechai.com\/ai\/gemini-3-1-pro-benchmarks\/\">Gemini 3.1 Pro was released<\/a>, GPT-5.3-Codex led SWE-bench Pro at 56.8% \u2014 a score Mythos Preview now exceeds by more than 21 points.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img data-recalc-dims=\"1\" fetchpriority=\"high\" decoding=\"async\" width=\"640\" height=\"725\" src=\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2026\/04\/image-26.png?resize=640%2C725&#038;ssl=1\" alt=\"\" class=\"wp-image-60267\" srcset=\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2026\/04\/image-26.png?w=751&amp;ssl=1 751w, https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2026\/04\/image-26.png?resize=265%2C300&amp;ssl=1 265w\" sizes=\"(max-width: 640px) 100vw, 640px\" \/><figcaption class=\"wp-element-caption\">Claude Mythos Preview coding benchmarks<\/figcaption><\/figure>\n\n\n\n<p>On <strong>SWE-bench Verified<\/strong>, the broader real-world software engineering test, Mythos hits <strong>93.9%<\/strong> against Opus 4.6&#8217;s <strong>80.8%<\/strong>. On <strong>SWE-bench Multilingual<\/strong>, which tests code across programming languages, Mythos scores <strong>87.3%<\/strong> against <strong>77.8%<\/strong> for Opus.<\/p>\n\n\n\n<p><strong>Terminal-Bench 2.0<\/strong>, which measures autonomous multi-step terminal coding \u2014 the kind of agentic work that <a href=\"https:\/\/officechai.com\/ai\/chinas-minimax-releases-m2-5-beats-gemini-3-pro-and-gpt-5-2-on-swe-bench\/\">Chinese models like Minimax M2.5<\/a> have been pushing hard to match \u2014 shows Mythos at <strong>82.0%<\/strong> against Opus 4.6&#8217;s <strong>65.4%<\/strong>.<\/p>\n\n\n\n<p>The <strong>SWE-bench Multimodal<\/strong> result is the most striking: <strong>59.0%<\/strong> for Mythos versus <strong>27.1%<\/strong> for Opus 4.6. That&#8217;s more than double. The benchmark, measured against an internal implementation, tests AI&#8217;s ability to understand visual context alongside code \u2014 increasingly important as AI agents are asked to work directly with GUIs and interfaces.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Reasoning: A Clear Step Change<\/h2>\n\n\n\n<p>Mythos Preview scores <strong>94.6%<\/strong> on <strong>GPQA Diamond<\/strong>, the graduate-level scientific reasoning benchmark spanning physics, chemistry, and biology. Opus 4.6 scores <strong>91.3%<\/strong>. These numbers look close, but GPQA Diamond is designed so that marginal gains at the top require substantially greater capability. <a href=\"https:\/\/officechai.com\/ai\/claude-opus-4-6-benchmarks-released\/\">Claude Opus 4.6 had already beaten<\/a> Google&#8217;s Gemini 3 Pro (91.3% vs 94.3% for Gemini 3.1 Pro) on this test; Mythos now goes further still.<\/p>\n\n\n\n<p>On <strong>Humanity&#8217;s Last Exam<\/strong> \u2014 the benchmark designed to be unsolvable by current AI \u2014 Mythos Preview without tools scores <strong>56.8%<\/strong> (Opus 4.6: <strong>40.0%<\/strong>). With tools enabled, Mythos hits <strong>64.7%<\/strong> against Opus 4.6&#8217;s <strong>53.1%<\/strong>. The without-tools number is the more meaningful one: it&#8217;s a test of raw reasoning, not search-augmented retrieval. Anthropic notes that Mythos still performs well at low effort on HLE, which they flag as a possible sign of some memorization \u2014 worth keeping in mind when reading those numbers.<\/p>\n\n\n\n<p>Benchmarks like Humanity&#8217;s Last Exam were created specifically because <a href=\"https:\/\/officechai.com\/ai\/reasoning-models-are-making-ai-benchmarks-irrelevant-openais-noam-brown\/\">reasoning models were making earlier tests irrelevant<\/a>. A 56.8% score without tools is still remarkable.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Agentic Search and Computer Use<\/h2>\n\n\n\n<p><strong>BrowseComp<\/strong>, which tests complex multi-step web research, shows Mythos at <strong>86.9%<\/strong> against Opus 4.6&#8217;s <strong>83.7%<\/strong> \u2014 a smaller gap, but notable because Anthropic says Mythos achieves this while using 4.9x fewer tokens. That&#8217;s not just smarter; it&#8217;s meaningfully more efficient.<\/p>\n\n\n\n<p><strong>OSWorld-Verified<\/strong>, a computer use benchmark where the AI must navigate real desktop interfaces autonomously, shows Mythos at <strong>79.6%<\/strong> against <strong>72.7%<\/strong> for Opus 4.6.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What It All Means<\/h2>\n\n\n\n<p>Mythos Preview is not a public model. Anthropic has restricted it to a closed group of security partners and enterprise organizations, citing its dual-use cybersecurity capabilities. But the benchmark profile reveals something broader: the gap between Mythos and the current public frontier is large enough that it represents a qualitative shift, not an incremental one.<\/p>\n\n\n\n<p><a href=\"https:\/\/officechai.com\/ai\/claude-sonnet-4-6-takes-second-spot-in-artificial-analysis-intelligence-index-beats-gpt-5-2\/\">Claude Opus 4.6 was already the benchmark leader<\/a> in most categories when it launched in February 2026, with Claude Sonnet 4.6 in second place on the Artificial Analysis Intelligence Index. Mythos Preview \u2014 if released \u2014 would reset those leaderboards entirely. On SWE-bench Verified alone, its 93.9% would sit more than 13 points above any publicly available model.<\/p>\n\n\n\n<p>The broader context is competitive pressure from all sides. Chinese open-source models like <a href=\"https:\/\/officechai.com\/ai\/chinese-startup-z-ais-glm-4-7-model-goes-past-kimi-k2-thinking-to-become-top-open-model-in-the-world\/\">Z.ai&#8217;s GLM-5<\/a> have been closing the gap with closed US models on SWE-bench Verified. Mythos Preview suggests Anthropic is not standing still \u2014 and that the internal capability gap between what labs have and what they release publicly is wider than most observers assume.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Anthropic is maintaining its lead in coding models, and how. Claude Mythos Preview \u2014 the unreleased frontier model at the center of Anthropic&#8217;s&#8230;<\/p>\n","protected":false},"author":1,"featured_media":60268,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_jetpack_newsletter_access":"","_jetpack_dont_email_post_to_subs":false,"_jetpack_newsletter_tier_id":0,"_jetpack_memberships_contains_paywalled_content":false,"_jetpack_memberships_contains_paid_content":false,"footnotes":""},"categories":[1029],"tags":[],"class_list":["post-60266","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v25.5 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Anthropic&#039;s Claude Mythos Preview Smashes Coding Benchmarks, Scores 77.8 On SWE-Bench Pro<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Anthropic&#039;s Claude Mythos Preview Smashes Coding Benchmarks, Scores 77.8 On SWE-Bench Pro\" \/>\n<meta property=\"og:description\" content=\"Anthropic is maintaining its lead in coding models, and how. Claude Mythos Preview \u2014 the unreleased frontier model at the center of Anthropic&#8217;s...\" \/>\n<meta property=\"og:url\" content=\"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/\" \/>\n<meta property=\"og:site_name\" content=\"OfficeChai\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/OfficeChai\/\" \/>\n<meta property=\"article:published_time\" content=\"2026-04-07T19:00:10+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2026-04-07T19:00:23+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/officechai.com\/wp-content\/uploads\/2026\/04\/MixCollage-08-Apr-2026-12-28-AM-4584.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1200\" \/>\n\t<meta property=\"og:image:height\" content=\"630\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"OfficeChai Team\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@OfficeChai\" \/>\n<meta name=\"twitter:site\" content=\"@OfficeChai\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"OfficeChai Team\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"4 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/\",\"url\":\"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/\",\"name\":\"Anthropic's Claude Mythos Preview Smashes Coding Benchmarks, Scores 77.8 On SWE-Bench Pro\",\"isPartOf\":{\"@id\":\"https:\/\/officechai.com\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2026\/04\/MixCollage-08-Apr-2026-12-28-AM-4584.jpg?fit=1200%2C630&ssl=1\",\"datePublished\":\"2026-04-07T19:00:10+00:00\",\"dateModified\":\"2026-04-07T19:00:23+00:00\",\"author\":{\"@id\":\"https:\/\/officechai.com\/#\/schema\/person\/5861f1134993293cc28905de7624d6b2\"},\"breadcrumb\":{\"@id\":\"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/#primaryimage\",\"url\":\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2026\/04\/MixCollage-08-Apr-2026-12-28-AM-4584.jpg?fit=1200%2C630&ssl=1\",\"contentUrl\":\"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2026\/04\/MixCollage-08-Apr-2026-12-28-AM-4584.jpg?fit=1200%2C630&ssl=1\",\"width\":1200,\"height\":630,\"caption\":\"claude mythos preview benchmarks\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/officechai.com\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Anthropic&#8217;s Claude Mythos Preview Smashes Coding Benchmarks, Scores 77.8 On SWE-Bench Pro\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/officechai.com\/#website\",\"url\":\"https:\/\/officechai.com\/\",\"name\":\"OfficeChai\",\"description\":\"Startups, Businesses And Careers\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/officechai.com\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/officechai.com\/#\/schema\/person\/5861f1134993293cc28905de7624d6b2\",\"name\":\"OfficeChai Team\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/officechai.com\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/61d744733248dc647d505d0676bb425323413132ee5447e86aa8eecbbb7b27d5?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/61d744733248dc647d505d0676bb425323413132ee5447e86aa8eecbbb7b27d5?s=96&d=mm&r=g\",\"caption\":\"OfficeChai Team\"},\"description\":\"Dotting the i's, crossing the t's.\",\"url\":\"https:\/\/officechai.com\/author\/admin\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Anthropic's Claude Mythos Preview Smashes Coding Benchmarks, Scores 77.8 On SWE-Bench Pro","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/","og_locale":"en_US","og_type":"article","og_title":"Anthropic's Claude Mythos Preview Smashes Coding Benchmarks, Scores 77.8 On SWE-Bench Pro","og_description":"Anthropic is maintaining its lead in coding models, and how. Claude Mythos Preview \u2014 the unreleased frontier model at the center of Anthropic&#8217;s...","og_url":"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/","og_site_name":"OfficeChai","article_publisher":"https:\/\/www.facebook.com\/OfficeChai\/","article_published_time":"2026-04-07T19:00:10+00:00","article_modified_time":"2026-04-07T19:00:23+00:00","og_image":[{"width":1200,"height":630,"url":"http:\/\/officechai.com\/wp-content\/uploads\/2026\/04\/MixCollage-08-Apr-2026-12-28-AM-4584.jpg","type":"image\/jpeg"}],"author":"OfficeChai Team","twitter_card":"summary_large_image","twitter_creator":"@OfficeChai","twitter_site":"@OfficeChai","twitter_misc":{"Written by":"OfficeChai Team","Est. reading time":"4 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/","url":"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/","name":"Anthropic's Claude Mythos Preview Smashes Coding Benchmarks, Scores 77.8 On SWE-Bench Pro","isPartOf":{"@id":"https:\/\/officechai.com\/#website"},"primaryImageOfPage":{"@id":"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/#primaryimage"},"image":{"@id":"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/#primaryimage"},"thumbnailUrl":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2026\/04\/MixCollage-08-Apr-2026-12-28-AM-4584.jpg?fit=1200%2C630&ssl=1","datePublished":"2026-04-07T19:00:10+00:00","dateModified":"2026-04-07T19:00:23+00:00","author":{"@id":"https:\/\/officechai.com\/#\/schema\/person\/5861f1134993293cc28905de7624d6b2"},"breadcrumb":{"@id":"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/#primaryimage","url":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2026\/04\/MixCollage-08-Apr-2026-12-28-AM-4584.jpg?fit=1200%2C630&ssl=1","contentUrl":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2026\/04\/MixCollage-08-Apr-2026-12-28-AM-4584.jpg?fit=1200%2C630&ssl=1","width":1200,"height":630,"caption":"claude mythos preview benchmarks"},{"@type":"BreadcrumbList","@id":"https:\/\/officechai.com\/ai\/claude-mythos-preview-benchmarks-swe-bench-pro\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/officechai.com\/"},{"@type":"ListItem","position":2,"name":"Anthropic&#8217;s Claude Mythos Preview Smashes Coding Benchmarks, Scores 77.8 On SWE-Bench Pro"}]},{"@type":"WebSite","@id":"https:\/\/officechai.com\/#website","url":"https:\/\/officechai.com\/","name":"OfficeChai","description":"Startups, Businesses And Careers","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/officechai.com\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/officechai.com\/#\/schema\/person\/5861f1134993293cc28905de7624d6b2","name":"OfficeChai Team","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/officechai.com\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/61d744733248dc647d505d0676bb425323413132ee5447e86aa8eecbbb7b27d5?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/61d744733248dc647d505d0676bb425323413132ee5447e86aa8eecbbb7b27d5?s=96&d=mm&r=g","caption":"OfficeChai Team"},"description":"Dotting the i's, crossing the t's.","url":"https:\/\/officechai.com\/author\/admin\/"}]}},"jetpack_featured_media_url":"https:\/\/i0.wp.com\/officechai.com\/wp-content\/uploads\/2026\/04\/MixCollage-08-Apr-2026-12-28-AM-4584.jpg?fit=1200%2C630&ssl=1","jetpack_shortlink":"https:\/\/wp.me\/p685C6-fG2","jetpack_likes_enabled":true,"jetpack_sharing_enabled":true,"_links":{"self":[{"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/posts\/60266","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/comments?post=60266"}],"version-history":[{"count":1,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/posts\/60266\/revisions"}],"predecessor-version":[{"id":60269,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/posts\/60266\/revisions\/60269"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/media\/60268"}],"wp:attachment":[{"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/media?parent=60266"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/categories?post=60266"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/officechai.com\/wp-json\/wp\/v2\/tags?post=60266"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}