Resources

The research behind Freecase's design principles.

We reviewed more than sixty papers, benchmarks, surveys, and technical reports on legal AI, legal search, source-grounded answers, document structure, ranking, reliability, and database search. This page lists the works that directly shaped our public-facing design principles.

What we took from the literature

The strongest signal across the research is practical, not magical: legal search works best when it respects structure, retrieves source text first, tests search separately from answer-writing, and treats AI answers as grounded assistance rather than independent legal authority.

We gratefully acknowledge the authors below. Their work gave Freecase clearer public principles: exact wording still matters, long legal texts need their own structure, ranking should be tested against real legal research tasks, and AI assistance should stay tied to the sources a user can inspect.

Exact words matter: legal language, citations, dates, and names remain critical.
Structure matters: paragraphs, holdings, sections, and statute hierarchy should not be flattened away.
Find first, then sort carefully: a useful answer starts with the right source text.
Test honestly: measure missed sources, incomplete answers, and citation support separately.

Bibliography

Useful papers and how they inform Freecase.

Evaluation Search Chunking Ranking Statutes Safety Surveys

Evaluation and Benchmarks

Pipitone & Alami, LegalBench-RAG, arXiv:2408.10343 (2024).

Benchmark for legal source retrieval quality.

Introduces practical ways to measure whether a system found the right legal source. It pushed Freecase toward held-out query and source-span tests before trusting any AI answer.

Das, Abualhaija & Bianculli, Fine-grained Claim-level RAG Benchmark for Law, arXiv:2605.21071 (2026).

Claim-level legal answer evaluation.

Separates whether the source is relevant, whether the answer is supported, and whether the answer is useful. It reinforces Freecase's source-first approach to AI assistance.

Li et al., RAGPerf, arXiv:2603.10765 (2026).

End-to-end search-and-answer benchmark.

Useful for latency and throughput planning. It reminds us to measure source finding, ranking, and answer writing separately, not as one blended demo score.

Hashmi, Adaptive Query Routing, arXiv:2604.14222 (2026).

Tiered routing for financial, legal, and medical search systems.

Supports the common-sense point that citation lookup, statute lookup, and broader doctrine research should not all be handled the same way.

Kataishi, Topic-Enriched Embeddings for RAG Precision, arXiv:2601.00891 (2025).

Topic features blended with embeddings.

Supports using topics as search filters and organizing labels. Freecase treats those labels as aids, not as replacements for the source text itself.

Zheng et al., A Reasoning-Focused Legal Retrieval Benchmark (2026).

Bar-exam and housing-statute retrieval benchmark.

Shows why legal query understanding matters: users often search by issue and rule, not by words already present in the target authority.

Legal Search Architecture

Zuccon et al., Case Law Retrieval: Problems, Methods, Challenges and Evaluations in the Last 20 Years, arXiv:2202.07209 (2022).

Survey of case-law retrieval.

Frames legal search as a ranking and evaluation problem, not a generic chatbot problem. It supports Freecase's emphasis on source quality and repeatable tests.

Gain et al., IITP@COLIEE 2019: Legal Information Retrieval Using BM25 and BERT, arXiv:2104.08653 (2021).

BM25 plus transformer retrieval.

Validates layered legal search: exact-word matching remains strong, while language models can help sort and refine results.

Althammer et al., DoSSIER@COLIEE 2021, arXiv:2108.03937 (2021).

Paragraph-level legal retrieval and reranking.

Its central lesson is practical: find useful passages or paragraphs inside long legal texts, then assemble the case-level result from that evidence.

Nigam et al., TraceRetriever, arXiv:2508.00679 (2025).

Rhetorical-role legal search.

Supports labeling passages by role: facts, issue, reasoning, holding, and disposition. That matters because not every paragraph in an opinion carries the same legal weight.

Rayo, de la Rosa & Garrido, ViDRILL, arXiv:2502.16767 (2025).

Hybrid retrieval for regulatory text.

Useful for statute and regulation search. It reinforces that legal corpora need exact wording and concept matching.

Nguyen et al., NOWJ@COLIEE 2025, arXiv:2509.08025 (2025).

Multi-stage legal retrieval and entailment.

Supports multi-stage legal search: gather candidates, compare evidence from several passages, then sort results by legal usefulness.

Li et al., A Survey of Long-Document Retrieval in the PLM and LLM Era, arXiv:2509.07759 (2025).

Long-document retrieval survey.

Defines the broader problem Freecase is solving: long legal documents with dispersed evidence. It supports breaking long documents into meaningful legal units.

Akarsu, Karaman & Mierbach, From BM25 to Corrective RAG, arXiv:2604.01733 (2026).

Layered search benchmark.

Shows that legal search improvements should be measured step by step, rather than jumping straight to answer generation.

Zhang, Feng & Zhang, LevelRAG, arXiv:2502.18139 (2025).

Multi-hop retrieval planning.

Useful for complex legal queries that need to be broken into issue, statute, court, and date constraints.

Chunking, Context, and Long Documents

Reuter et al., Towards Reliable Retrieval in RAG Systems for Large Legal Datasets, arXiv:2510.06999 (2025).

Source context for legal search.

Names the problem of finding a helpful paragraph but losing the larger case context. It supports adding enough source context for users to understand the result.

Merola & Singh, Reconstructing Context, arXiv:2504.19754 (2025).

Late chunking versus contextual retrieval.

Encourages careful testing of cheaper and more expensive ways to preserve context around passages. Freecase treats that as a measurement question, not a matter of taste.

Shaukat et al., A Systematic Investigation of Document Chunking Strategies and Embedding Sensitivity, arXiv:2603.06976 (2026).

Large controlled chunking benchmark.

Supports paragraph-group chunking over blind fixed windows. It directly informs Freecase's Illinois opinion chunk defaults.

Gunther et al., Late Chunking, arXiv:2409.04701 (2024/2025).

Context-aware passage representation.

Shows a way to preserve document context before scoring smaller passages, which is important when legal meaning depends on surrounding text.

Conti et al., Context is Gold to find the Gold Passage, arXiv:2505.24782 (2025).

Context-aware embedding benchmark.

Shows why isolated snippets can be misleading. Freecase uses this to justify showing source and section context around opinion paragraphs.

Lu et al., HiChunk, arXiv:2509.11552 (2025).

Hierarchical chunking and auto-merge retrieval.

Supports retrieving focused passages while still joining nearby text when the query needs a complete legal reasoning unit.

Smigielski et al., Chunking Methods on Retrieval-Augmented Generation, arXiv:2606.00881 (2026).

Effectiveness versus cost.

Warns against adopting fashionable chunkers without corpus-scale timeout tests. Freecase uses it as a cost and robustness gate.

Prior, Milanova & Schultz, Chunking German Legal Code (2026 preprint).

Statute chunking benchmark.

Strongly supports using section and subsection boundaries for statute search.

Ranking, Embeddings, and Query Understanding

Li et al., DELTA, arXiv:2403.18435 (2024).

Discriminative legal case encoder.

Focuses on key-fact discrimination: legally important similarity is not the same as surface similarity.

Donabauer & Kruschwitz, A Reproducibility Study of Graph-Based Legal Case Retrieval, arXiv:2504.08400 (2025).

Reproduction study for complex retrieval methods.

It is a cautionary paper: extra complexity does not automatically improve legal search. Freecase uses it to justify simple baselines before more complex ranking.

Barron et al., Bridging Legal Knowledge and AI, arXiv:2502.20364 (2025).

New Mexico legal search and answer prototype.

Close to Freecase's legal domain. It reinforces the value of topical organization and careful benchmarking before scaling beyond a smaller corpus.

van der Meer & Rossi, LegalCheck (2025).

Municipal legal advice drafting.

Supports section-based organization and careful handling of long legal narratives. It is more drafting-oriented than Freecase search, but the source-finding lessons transfer.

Alshehri et al., Neural Reranking for UK Statutory Retrieval, Artificial Intelligence and Law (2026).

Provision-level legal reranker and distillation.

Shows how provision-level statute ranking can be evaluated with graded relevance. Freecase borrows the evaluation lesson, not the UK-only metrics.

Elganayni & Saleh, Re-Ranking Through an Attribution Lens for Citation Quality in Legal QA (2026).

Citation-quality reranking.

Shows that topical similarity is not the same as a passage worth citing. This supports evaluating whether a result is actually useful legal authority.

Statutes and Legal Structure

Louis, van Dijck & Spanakis, Finding the Law, EACL 2023.

Statutory retrieval with legal structure.

Supports including statute hierarchy in search text and embeddings. Freecase applies this to statutes: title, act, chapter, section, and parent headings matter.

Reliability, Hallucination, and Treatment Signals

Magesh et al., Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools, Journal of Empirical Legal Studies (2025).

Empirical evaluation of legal AI tools.

This is central to Freecase's caution around AI answers. It tells us to track hallucination and incompleteness separately and never market legal AI as infallible.

Dantart, Reliability by Design, arXiv:2601.15476 (2026).

Fabrication risk in legal drafting systems.

Supports a conservative product boundary: any AI drafting or answer feature must be tied to sources before it becomes user-facing.

Demir & Canbaz, Validate Your Authority, arXiv:2605.17691 (2026).

Precedent treatment classification.

Directly informs citator evaluation. The Average Severity Error idea matches our view that missing overruled authority is much worse than minor classification noise.

Kalusev & Brkljac, Named Entity Recognition for Serbian Legal Documents, arXiv:2502.10582 (2025).

Legal NER in a low-resource setting.

Supports careful extraction of names, courts, dates, statutes, and money amounts when those details can be identified reliably.

Akter et al., A Comprehensive Survey on Legal Summarization (2025).

Legal summarization survey.

Useful for understanding risks in summaries of long opinions. Freecase treats summaries as assistance, not as a substitute for source text.

Castano et al., JusBuild, CEUR (2026).

Source-grounded legal document building.

Reinforces human review and section-level organization for legal documents.

Surveys and Field Maps

Ariai, Mackenzie & Demartini, Natural Language Processing for the Legal Domain, arXiv:2410.21306 (2025).

Systematic legal NLP survey.

Provides a task map for legal language technology: search, question answering, summarization, entity extraction, argument mining, and legal language models.

Hou, Ye, Zeng et al., Large Language Models Meet Legal Artificial Intelligence, arXiv:2509.09969 (2025).

Legal LLM survey.

Confirms the mainstream pattern of finding sources, ranking them, and using AI carefully where it helps. It also flags data quality as a major legal AI risk.

Zhu et al., Large Language Models for Information Retrieval, arXiv:2308.07107 (2023/2025).

Large language models across search systems.

Helps locate where language models can help: query wording, ranking, and source-grounded reading. Freecase treats answer generation as only one part of the system.

Notes.

This bibliography is based on Freecase's local research review files, including Research/RESEARCH_PAPER_REVIEW_SYNTHESIS.md, RESEARCH_PAPER_REVIEWcpy.md, and batch reviews 1-6.
Some preprints are newer than their final publication status and may have changed since the local review. The page links to arXiv, DOI, ACL Anthology, or other public pages when a public source is available.
Freecase gratefully acknowledges the authors whose research informs this work. Citation here does not imply endorsement of Freecase by any author or institution.