Introduces practical ways to measure whether a system found the right legal source. It pushed Freecase toward held-out query and source-span tests before trusting any AI answer.

Loading…

Loading…
Resources
We reviewed more than sixty papers, benchmarks, surveys, and technical reports on legal AI, legal search, source-grounded answers, document structure, ranking, reliability, and database search. This page lists the works that directly shaped our public-facing design principles.
The strongest signal across the research is practical, not magical: legal search works best when it respects structure, retrieves source text first, tests search separately from answer-writing, and treats AI answers as grounded assistance rather than independent legal authority.
We gratefully acknowledge the authors below. Their work gave Freecase clearer public principles: exact wording still matters, long legal texts need their own structure, ranking should be tested against real legal research tasks, and AI assistance should stay tied to the sources a user can inspect.
Bibliography
Introduces practical ways to measure whether a system found the right legal source. It pushed Freecase toward held-out query and source-span tests before trusting any AI answer.
Separates whether the source is relevant, whether the answer is supported, and whether the answer is useful. It reinforces Freecase's source-first approach to AI assistance.
Useful for latency and throughput planning. It reminds us to measure source finding, ranking, and answer writing separately, not as one blended demo score.
Supports the common-sense point that citation lookup, statute lookup, and broader doctrine research should not all be handled the same way.
Supports using topics as search filters and organizing labels. Freecase treats those labels as aids, not as replacements for the source text itself.
Shows why legal query understanding matters: users often search by issue and rule, not by words already present in the target authority.
Frames legal search as a ranking and evaluation problem, not a generic chatbot problem. It supports Freecase's emphasis on source quality and repeatable tests.
Validates layered legal search: exact-word matching remains strong, while language models can help sort and refine results.
Its central lesson is practical: find useful passages or paragraphs inside long legal texts, then assemble the case-level result from that evidence.
Supports labeling passages by role: facts, issue, reasoning, holding, and disposition. That matters because not every paragraph in an opinion carries the same legal weight.
Useful for statute and regulation search. It reinforces that legal corpora need exact wording and concept matching.
Supports multi-stage legal search: gather candidates, compare evidence from several passages, then sort results by legal usefulness.
Defines the broader problem Freecase is solving: long legal documents with dispersed evidence. It supports breaking long documents into meaningful legal units.
Shows that legal search improvements should be measured step by step, rather than jumping straight to answer generation.
Useful for complex legal queries that need to be broken into issue, statute, court, and date constraints.
Names the problem of finding a helpful paragraph but losing the larger case context. It supports adding enough source context for users to understand the result.
Encourages careful testing of cheaper and more expensive ways to preserve context around passages. Freecase treats that as a measurement question, not a matter of taste.
Supports paragraph-group chunking over blind fixed windows. It directly informs Freecase's Illinois opinion chunk defaults.
Shows a way to preserve document context before scoring smaller passages, which is important when legal meaning depends on surrounding text.
Shows why isolated snippets can be misleading. Freecase uses this to justify showing source and section context around opinion paragraphs.
Supports retrieving focused passages while still joining nearby text when the query needs a complete legal reasoning unit.
Warns against adopting fashionable chunkers without corpus-scale timeout tests. Freecase uses it as a cost and robustness gate.
Strongly supports using section and subsection boundaries for statute search.
Focuses on key-fact discrimination: legally important similarity is not the same as surface similarity.
It is a cautionary paper: extra complexity does not automatically improve legal search. Freecase uses it to justify simple baselines before more complex ranking.
Close to Freecase's legal domain. It reinforces the value of topical organization and careful benchmarking before scaling beyond a smaller corpus.
Supports section-based organization and careful handling of long legal narratives. It is more drafting-oriented than Freecase search, but the source-finding lessons transfer.
Shows how provision-level statute ranking can be evaluated with graded relevance. Freecase borrows the evaluation lesson, not the UK-only metrics.
Shows that topical similarity is not the same as a passage worth citing. This supports evaluating whether a result is actually useful legal authority.
Supports including statute hierarchy in search text and embeddings. Freecase applies this to statutes: title, act, chapter, section, and parent headings matter.
This is central to Freecase's caution around AI answers. It tells us to track hallucination and incompleteness separately and never market legal AI as infallible.
Supports a conservative product boundary: any AI drafting or answer feature must be tied to sources before it becomes user-facing.
Directly informs citator evaluation. The Average Severity Error idea matches our view that missing overruled authority is much worse than minor classification noise.
Supports careful extraction of names, courts, dates, statutes, and money amounts when those details can be identified reliably.
Useful for understanding risks in summaries of long opinions. Freecase treats summaries as assistance, not as a substitute for source text.
Reinforces human review and section-level organization for legal documents.
Provides a task map for legal language technology: search, question answering, summarization, entity extraction, argument mining, and legal language models.
Confirms the mainstream pattern of finding sources, ranking them, and using AI carefully where it helps. It also flags data quality as a major legal AI risk.
Helps locate where language models can help: query wording, ranking, and source-grounded reading. Freecase treats answer generation as only one part of the system.
Research/RESEARCH_PAPER_REVIEW_SYNTHESIS.md, RESEARCH_PAPER_REVIEWcpy.md, and batch reviews 1-6.