We recently rolled out Anthropic’s latest model, Sonnet 4.6, across the Legora platform. Before rolling out new models to our customers, we run them through our comprehensive legal evaluation framework designed to measure how frontier models perform on real, billable legal work across jurisdictions, practice areas, and task types.
Sonnet 4.6 outperformed Sonnet 4.5 across all tasks.
While we observed incremental improvements on structured/factual tasks like timeline (+1.2%), extraction (+3.2%), and comparison (+4.8%), the biggest improvements were delivered on open-ended reasoning tasks like research (+44.4%), long context (+26.4%), and legal analysis (+20.7%).
These improvements showed up where it matters most for legal work:
Accuracy
Across the full benchmark, Sonnet 4.6 delivered a 4% improvement in accuracy vs Sonnet 4.5.
In practice, that shows up as fewer missed provisions, more complete comparisons, and stronger first-pass analysis. When scaled across hundreds of documents or transactions, that delta compounds.
Understanding user intent
Legal prompts are rarely simple. They often include conditional instructions, embedded assumptions, and multi-step requests.
The newer model is more consistent in understanding what the user is actually asking for — particularly when distinguishing between summary, analysis, and structured output. That reduces iteration and improves usability.
Depth of legal reasoning
The most noticeable qualitative difference was thoroughness. On tasks requiring deeper reasoning, the model was more exhaustive: identifying more relevant clauses, surfacing edge cases, and avoiding overly narrow interpretations.
Improvements in exhaustiveness matter. Missing one clause can materially change an outcome.
Long-context performance
Legal analysis frequently requires connecting information across hundreds of pages.
The latest release shows stronger long-context reasoning: better cross-referencing between documents, fewer dropped details, and more coherent synthesis in multi-document tasks.
For diligence, regulatory analysis, and litigation support, that capability is foundational.
Reliability and verification
We also observed improvements in consistency and internal verification. The outputs were less prone to subtle analytical shortcuts and more internally coherent.x
In legal workflows, reliability is not optional. Hallucinations and sloppiness are unacceptable. Incremental gains in verification behavior meaningfully improve trust.
How Sonnet 4.6 benefits Legora customers
A 4% improvement across the full benchmark translates to:
Fewer missed clauses
More complete issue spotting
Stronger first-pass outputs
More efficient review cycles
Higher confidence
For law firms, that means being able to deliver high quality work more efficiently and reliably. For in-house teams, it means handling higher volume without sacrificing rigor.
Looking Ahead
The next phase of legal AI is more than just drafting fluency—it’s about reliability, depth, and executing complex workflows. We will continue to test rigorously and adopt improvements only when they raise the standard in ways that matter for law firms and in-house legal teams.


