MS MARCO V2 Leaderboard

First released at NIPS 2016 the MS MARCO dataset was an ambitious, real-world Machine Reading Comprehension Dataset. Based on feedback from the community, we designed and released the V2 dataset and its related challanges. Can your model read, comprehend, and answer questions better than humans?

Conversational Search Coming Soon

UnAnnounced Dataset Coming Soon

1. Given a query and 1000 relevant passages rerank the passages based on relevance(Passage Re-Ranking).

2. Given a query and 10 passages provide the best answer availible based(Q&A).

3. Given a query and 10 passages provide the best answer avaible in natural languauge that could be used by a smart device/digital assistant(Q&A + Natural Langauge Generation).



Passage Re-Ranking(10/26/2018-Present)

Rank Model Submission Date MRR@10 On Eval MRR@10 On Dev
1 SAN + BERT base Yu Wang, Xiaodong Liu, Jianfeng Gao - Deep Learning Group, Microsoft Research AI [Xiaodong, et al. '18] January 22th, 2019 0.359 0.370
2 BERT + Small Training Rodrigo Nogueira and Kyunghyun Cho - New York University [Nogueira, et al. '19] and [Code] January 7th, 2019 0.359 0.365
3 BERT base + L2R Ming Yan March 16th,2019 0.356 0.364
4 BERT base+ attention ranking anonymous March 1st, 2019 0.347 0.317
5 BERT base+ attention ranking anonymous March 11th, 2019 0.344 -
6 BERT base+ attention ranking anonymous March 4th, 2019 0.343 -
7 BERT base+ attention ranking anonymous March 2nd, 2019 0.335 -
8 BERT + Multilayer Interaction anonymous February 19th,2019 0.329 0.311
9 BERT base+ranking anonymous February 8th, 2019 0.326 0.316
10 IRNet (Deep CNN/IR Hybrid Network) Dave DeBarr, Navendu Jain, Robert Sim, Justin Wang, Nirupama Chandrasekaran – Microsoft January 2nd, 2019 0.281 0.278
11 Neural Kernel Match IR (Conv-KNRM) (Ensembled)(1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) [Dai et al. '18] Novmeber 28th, 2018 0.271 0.290
12 Axiom-Regularized Conv-KNRM Anonymous February 19, 2019 0.263 0.262
13 [Official Baseline] Duet V2 (Ensembled) Bhaskar Mitra, Fernando Diaz, Nick Craswell - Microsoft AI & Research [Mitra et al. '19] and [Code] February 19, 2019 0.253 0.252
14 Duet with query term independence assumption (Single) Anonymous March 14th, 2019 0.252 0.254
15 Neural Kernel Match IR (Conv-KNRM) (Single)(1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) [Dai et al. '18] February 19, 2019 0.247 0.247
16 [Official Baseline] Duet V2 (Single) Bhaskar Mitra, Fernando Diaz, Nick Craswell - Microsoft AI & Research [Mitra et al. '19s] and [Code] February 20, 2019 0.245 0.243
17 Neural Kernel Match IR (Conv-KNRM) (Ensembled)(1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) [Dai et al. '18] Novmeber 26th, 2018 0.199 0.199
18 Neural Kernel Match IR (KNRM) ((1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) [ Xiong et al. '17] December 10th, 2018 0.198 0.218
19 Feature-based LeToR: simple-feature based RankSVM(1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) December 10th, 2018 0.191 0.195
20 [Official Baseline]BM25 Stephen E. Robertson; Steve Walker; Susan Jones; Micheline Hancock-Beaulieu & Mike Gatford (Implemented by MSMARCO Team) [ Robertson et al. '94] Novmeber 1st, 2018 0.165 0.167
21 BERT Represenatation anonymous February 19th,2019 0.015 0.043

Q&A Task(03/01/2018-Present)

Rank Model Submission Date Rouge-L Bleu-1
1 Human Performance April 23th, 2018 0.539 0.485
2 Selector+Combine-Content-Generator QA Model Shengjie Qian of Caiyun xiaoyi AI and BUPT March 19th, 2019 0.525 0.544
3 Masque Q&A Style NTT Media Intelligence Laboratories [Nishida et al. '19] January 3rd, 2019 0.522 0.437
4 Deep Cascade QA Ming Yan [Yan et al. '18] December 12th, 2018 0.520 0.546
5 VNET Baidu NLP [Wang et al. '18] November 8th, 2018 0.516 0.543
6 Selector+Combine-Content-Generator NL Model Shengjie Qian of Caiyun xiaoyi AI and BUPT March 11th, 2019 0.496 0.535
7 Masque NLGEN Style NTT Media Intelligence Laboratories [Nishida et al. '19] January 3rd, 2019 0.489 0.488
8 BERT+ Multi-Pointer-Generator Tongjun Li of the ColorfulClouds Tech and BUPT March 19th, 2019 0.480 0.514
9 SNET + CES2S Bo Shao of SYSU University July 24th, 2018 0.450 0.464
10 Extraction-net zlsh80826 October 20th, 2018 0.437 0.444
11 SNET JY Zhao August 30th, 2018 0.436 0.463
12 BIDAF+ELMo+SofterMax Wang Changbao November 16th, 2018 0.436 0.459
13 DNETQA Geeks August 1st, 2018 0.432 0.479
14 Reader-Writer Microsoft Business Applications Group AI Research September 16th, 2018 0.421 0.436
15 SNET+seq2seq Yihan Ni of the CAS Key Lab of Web Data Science and Technology, ICT, CAS June 1st, 2018 0.398 0.423
16 KIGN-QA Chenliang Li March 11th, 2019 0.419 0.403
17 lightNLP+BiDAFEnliple AI February 1st, 2019 0.298 0.156
18 BIDAF+seq2seq Yihan Ni of the CAS Key Lab of Web Data Science and Technology, ICT, CAS May 29th, 2018 0.276 0.288
19 BiDaF Baseline(Implemented By MSMARCO Team)
Allen Institute for AI & University of Washington [Seo et al. '16]
April 23th, 2018 0.240 0.106
20 TrioNLP + BiDAF Trio.AI of the CCNU September 23rd, 2018 0.205 0.232
21 BiDAF + LSTM Meefly January 15th,2019 0.153 0.120

Q&A + Natural Langauge Generation Task(03/01/2018-Present)

Rank Model Submission Date Rouge-L Bleu-1
1 Human Performance April 23th, 2018 0.632 0.530
2 Masque NLGEN Style NTT Media Intelligence Laboratories [Nishida et al. '19] January 3rd, 2019 0.496 0.501
3 BERT+ Multi-Pointer-Generator Tongjun Li of the ColorfulClouds Tech and BUPT March 11th,2019 0.485 0.462
4 BERT+ Multi-Pointer-Generator (Single) Tongjun Li of the ColorfulClouds Tech and BUPT March 19th,2019 0.484 0.459
5 Selector+Combine-Content-Generator NLGEN Model Shengjie Qian of Caiyun xiaoyi AI and BUPT March 11th, 2019 0.487 0.449
6 VNET Baidu NLP [Wang et al. '18] November 8th, 2018 0.484 0.468
7 SNET + CES2S Bo Shao of SYSU University July 24th, 2018 0.450 0.406
8 Reader-Writer Microsoft Business Applications Group AI Research September 16th, 2018 0.439 0.426
9 KIGN-QA Chenliang Li March 11th, 2019 0.437 0.455
10 ConZNet Samsung Research [Indurthi et al. '18] July 16th, 2018 0.421 0.386
11 Bayes QA Bin Bi of Alibaba NLP June 14st, 2018 0.411 0.435
12 SNET+seq2seq Yihan Ni of the CAS Key Lab of Web Data Science and Technology, ICT, CAS June 1st, 2018 0.401 0.375
13 BPG-NET Zhijie Sang of the Center for Intelligence Science and Technology Research(CIST) of the Beijing University of Posts and Telecommunications (BUPT) August 1st, 2018 0.382 0.347
14 Deep Cascade QA Ming Yan October 25th, 2018 0.351 0.374
15 BIDAF+seq2seq Yihan Ni of the CAS Key Lab of Web Data Science and Technology, ICT, CAS May 29th, 2018 0.322 0.283
16 Masque Q&A Style NTT Media Intelligence Laboratories [Nishida et al. '19] January 3rd, 2019 0.285 0.399
17 Selector+Combine-Content-Generator QA Model Shengjie Qian of Caiyun xiaoyi AI and BUPT March 11th, 2019 0.281 0.337
18 DNET QA Geeks August 1st, 2018 0.275 0.332
19 BIDAF+ELMo+SofterMax Wang Changbao November 16th, 2018 0.268 0.346
20 SNET JY Zhao May 29th, 2018 0.247 0.308
21 Extraction-net zlsh80826 August 14th, 2018 0.247 0.321
22 lightNLP+BiDAF Enliple AI February 1st, 2019 0.210 0.108
23 BiDaF Baseline(Implemented By MSMARCO Team)
Allen Institute for AI & University of Washington [Seo et al. '16]
April 23th, 2018 0.169 0.093
24 TrioNLP + BiDAF Trio.AI of the CCNU September 23rd, 2018 0.142 0.160
25 BiDAF + LSTM Meefly January 15th,2019 0.119 0.173

MS MARCO V1 Leaderboard(12/01/2016-03/31/2018)

Rank Model Submission Date Rouge-L Bleu-1
1 MARS
YUANFUDAO research NLP
March 26th, 2018 0.497 0.480
2 Human Performance
December 2016 0.470 0.460
3 V-Net
Baidu NLP [Wang et al '18]
February 15th, 2018 0.462 0.445
4 S-Net
Microsoft AI and Research [Tan et al. '17]
June 2017 0.452 0.438
5 R-Net
Microsoft AI and Research [Wei et al. '16]
May 2017 0.429 0.422
6 HieAttnNet
Akaitsuki
March 26th, 2018 0.423 0.448
7 BiAttentionFlow+
ShanghaiTech University GeekPie_HPC team
March 11th, 2018 0.415 0.381
8 ReasoNet
Microsoft AI and Research [Shen et al. '16]
April 28th, 2017 0.388 0.399
9 Prediction
Singapore Management University [Wang et al. '16]
March 2017 0.373 0.407
10 FastQA_Ext
DFKI German Research Center for AI [Weissenborn et al. '17]
March 2017 0.337 0.339
11 FastQA
DFKI German Research Center for AI [Weissenborn et al. '17]
March 2017 0.321 0.340
12 Flypaper Model
ZhengZhou University
March 14th, 2018 0.317 0.342
13 DCNMarcoNet
Flying Riddlers @ Carnegie Mellon University
March 31st, 2018 0.313 0.238
14 BiDaF Baseline for V2 (Implemented By MSMARCO Team)
Allen Institute for AI & University of Washington [Seo et al. '16]
April 23th, 2018 0.268 0.129
15 ReasoNet Baseline
Trained on SQuAd, Microsoft AI & Research [Shen et al. '16]
December 2016 0.192 0.148