Change Detection 2026 guidelines
This page describes the specific guidelines for the 2026 track. Please read this entire page fully... this is a new task and there are some novel features that may not be obvious.
Bring questions to the TREC Slack #trec-change-detection-2026 channel.
Data
We will use the English portion of the RAGTIME1 dataset, available from HuggingFace Datasets. RAGTIME1 has 1M documents, with 1000 documents on each day of the collection.
A document in RAGTIME is a JSON object:
{
"id": "0e4ae416-c558-4741-9e4a-4988869bcd74_2154817",
"text": "Means, Franco lead Orioles to 5-2 win over Tigers\n\nDETROIT (AP) — John Means struck out six in six strong innings, Maikel Franco homered and the Baltimore Orioles beat the Detroit Tigers 5-2 on Saturday night.\n\nMeans (5-3) gave up one run on four hits and recorded his first victory since May 5....",
"url": "https://www.timesunion.com/sports/article/Means-Franco-lead-Orioles-to-5-2-win-over-Tigers-16355042.php",
"date": "2021-08-01T02:08:10.000Z"
}
The collection is timestamp-ordered. Participants must process the collection in order, one day at a time, and may not permit their systems to access documents in the "future". We are exploring developing a lightweight evaluation harness to simplify processing the collection and producing correctly-formatted output.
Participant systems must to output their decisions on that day's documents that day, before receiving documents from subsequent days. Likewise, systems are only producing output for the documents of that day and not any days prior to that. Everything happens in distinct one-day chunks.
Topics
The topics will be in JSONL format:
{
"tid": "a topic identifier",
"label": "a label for the topic, similar to a title",
"narrative": "a description of the information need in paragraph form. The narrative describes the topic succinctly, as one might tell it to someone conversationally. It is not intended to completely specify the information need.",
"questions": [
{
"qid": "a question identifier",
"question": "the question",
"rel_docs": [
"document-id-1",
"document-id-2",
...
]
},
{ ... }
]
}
All assessor-created analytic questions are present in the topic description, so they are known to the system at the start of the task. Additionally, questions will have a small number of example documents. Systems may use those documents when they arrive in the document stream. They are not guaranteed to be the first relevant documents in the collection.
Output format
The format for a run is a JSONL file, starting with a run metadata line and then with one line per topic. (JSONL below pretty-printed for clarity) In general, runs may contain other information as long as the required fields are present. This is to allow you to include metadata about different stages of your process in the run.
{ "runtag": "nist_cd_1",
"run-type": "manual | automatic",
"llms-used": [ "llama47", "gpt-pi", "gemini-foo-preview" ],
"description": "a free text description of yout run",
"other metadata": "as you like"
},
{
"topic": "s topic identifier",
"the run may contain": "other metadata here as you like",
"results": {
"2021-08-01": [
{
"qid": "q_1",
"question-rank": 0,
"doc-ranking": [
{ "docid": "doc-id-1", "score": 0.9876 },
{ "docid": "doc-id-2", "score": 0.9870 },
...
],
"extra": { "you can include": "extra metadata about your question response, like a score, or other info useful to you." }
},
{
"qid": "nist_cd_1_q_1",
"question-rank": 1,
"question-text": "This is a new analytic question proposed by my system",
"doc-ranking": [ ... ]
},
...
],
"2021-08-02": ...
}
}
Note that the TREC submission system enforces limits on the runtag: it must be 20 characters or less and can only include letters, numbers, hyphens, periods (not as the first character), and underscores. It must also be unique. Traditionally runtags start with your team name or an abbreviation thereof, for example "nist-cd1".
The top level may include any other slots you like, as long as topic and runtag are there. You can use this to store additional metadata about that topic in your run. If your system is agentic, for example, you might include a representation of the agent interaction here.
The results block is an object with one entry per date in the collection. If a date in the collection is not present in the run, that is interpreted as indicating that all known questions are equally ranked at rank 0.
Each date entry is a list with one entry per question. If a known question is missing from the list, it is assumed to be ranked last out of all questions and tied with all other missing questions. (This way, you only need to include questions that you think matter on that day.)
The question identifier qid is either the qid of a question in the topic file, or an identifier that starts with your runtag if the question is newly proposed by your system. Proposed questions are considered to be part of the set of questions for that topic from that day forward. Proposed questions must include a question-text field on the first day they appear, containing the question.
The question-rank is an integer greater than or equal to zero. Ties between questions will be broken arbitrarily.
The doc-ranking is a list of at most 100 docid,score pairs. This is interpreted as a ranked list of documents that are relevant to that question. The score is a double-precision floating-point number. The documents must be from that day (the 10-digit prefix of the document datestamp must match the date of this results entry). Documents not in the ranking are assumed to be retrieved with a tied score at the end of the ranking. Tied scores will be broken arbitrarily. The ranking may be empty, but it must be present.
Aside from qid, question-text (if applicable), question-rank, and doc-ranking, the block can contain other fields with metadata about your run.
Evaluation
Timeline
- Topics release: TBD
- Runs due: TBD
- Scores released: TBD