MS MARCO DataSets

First released at NIPS 2016 the MS MARCO dataset was an ambitious, real-world Machine Reading Comprehension Dataset. Since then we have been slowly improving the existing QA datasets and releasing new datasets. How does your model perform?

1. UnAnnounced Dataset Coming Soon

2. Given a session with 2-n queries with one query being masked, predict the masked query(Conversational Search).

3. Given a query and a corpus of 8.8m passages, rank the passages by relevance. Use either the full corpus or start with BM25s top 1000(Ranking)

4. Given a query and 10 passages provide the best answer availible based(Q&A).

5. Given a query and 10 passages provide the best answer avaible in natural languauge that could be used by a smart device/digital assistant(Q&A + Natural Langauge Generation).



Conversational Sessions(05/15/2019)

Rank Model Submission Date Recall Precision Recall
1 Baseline MSMARCO Team Coming Soon - - -

UnAnnounced Dataset(06/01/2019)

Rank Model Submission Date EM F1
1 Baseline MSMARCO Team Coming Soon - -

Passage Retrieval(10/26/2018-Present)

Rank Model Ranking Style Submission Date MRR@10 On Eval MRR@10 On Dev
1 BERTter Indexing (1)Rodrigo Nogueira, (2)Wei Yang, (3)Jimmy Lin, (4)Kyunghyun Cho - New York University(1,4), University of Waterloo(2,3), Facebook AI Research(4) [Nogueira et al. '19] Full Ranking April 8th, 2019 0.368 0.375
2 SAN + BERT base Yu Wang, Xiaodong Liu, Jianfeng Gao - Deep Learning Group, Microsoft Research AI [Xiaodong, et al. '18] ReRanking January 22th, 2019 0.359 0.370
3 BERT + Small Training Rodrigo Nogueira(1) and Kyunghyun Cho(2) - New York University(1,2), Facebook AI Research(2) [Nogueira, et al. '19] and [Code] ReRanking January 7th, 2019 0.359 0.365
4 BERT base + L2R Ming Yan ReRanking March 16th,2019 0.356 0.364
5 BERT base + attention ranking anonymous ReRanking March 1st, 2019 0.347 0.317
6 BERT base + attention ranking anonymous ReRanking March 11th, 2019 0.344 -
7 BERT base + attention ranking anonymous ReRanking March 4th, 2019 0.343 -
8 BERT base + attention ranking anonymous ReRanking March 2nd, 2019 0.335 -
9 BERT + Multilayer Interaction Yifan Qiao(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Lui(4) - Tsinghua University(1,3,4), Microsoft Research(2) [ Qiao et al. '19] ReRanking February 19th,2019 0.329 0.311
10 BERT base + ranking Yifan Qiao(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Lui(4) - Tsinghua University(1,3,4), Microsoft Research(2) [ Qiao et al. '19] ReRanking February 8th, 2019 0.326 0.316
11 IRNet (Deep CNN/IR Hybrid Network) Dave DeBarr, Navendu Jain, Robert Sim, Justin Wang, Nirupama Chandrasekaran – Microsoft ReRanking January 2nd, 2019 0.281 0.278
12 Neural Kernel Match IR (Conv-KNRM) (Ensembled)(1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) [Dai et al. '18] ReRanking Novmeber 28th, 2018 0.271 0.290
13 Axiom-Regularized Conv-KNRM Corby Rosset, Bhaskar Mitra, Chenyan Xiong, Nick Craswell, Xia Song, Saurabh Tiwary - Microsoft AI & Research[Rosset et al. '19] ReRanking February 19, 2019 0.263 0.262
14 [Official Baseline] Duet V2 (Ensembled) Bhaskar Mitra, Fernando Diaz, Nick Craswell - Microsoft AI & Research [Mitra et al. '19] and [Code] ReRanking February 19, 2019 0.253 0.252
15 Duet with query term independence assumption (Single) Anonymous ReRanking March 14th, 2019 0.252 0.254
16 Neural Kernel Match IR (Conv-KNRM) (Single)(1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) [Dai et al. '18] ReRanking February 19, 2019 0.247 0.247
17 [Official Baseline] Duet V2 (Single) Bhaskar Mitra, Fernando Diaz, Nick Craswell - Microsoft AI & Research [Mitra et al. '19s] and [Code] ReRanking February 20, 2019 0.245 0.243
18 BM25 (Anserini) + doc2query (1)Rodrigo Nogueira, (2)Wei Yang, (3)Jimmy Lin, (4)Kyunghyun Cho - New York University(1,4), University of Waterloo(2,3), Facebook AI Research(4) [Nogueira et al. '19] ReRanking April 10th, 2019 0.218 0.215
19 Neural Kernel Match IR (Conv-KNRM) (Ensembled)(1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) [Dai et al. '18] ReRanking Novmeber 26th, 2018 0.199 0.199
20 Neural Kernel Match IR (KNRM) ((1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) [ Xiong et al. '17] ReRanking December 10th, 2018 0.198 0.218
21 Feature-based LeToR: simple-feature based RankSVM(1)Yifan Qiao, (2)Chenyan Xiong, (3)Zhenghao Liu, (4)Zhiyuan Liu-Tsinghua University(1, 3, 4); Microsoft Research AI(2) ReRanking December 10th, 2018 0.191 0.195
22 BM25 (Anserini) (1)Rodrigo Nogueira, (2)Wei Yang, (3)Jimmy Lin, (4)Kyunghyun Cho - New York University(1,4), University of Waterloo(2,3), Facebook AI Research(4)[Nogueira et al. '19] ReRanking April 10th, 2019 0.186 0.184
23 [Official Baseline]BM25 Stephen E. Robertson; Steve Walker; Susan Jones; Micheline Hancock-Beaulieu & Mike Gatford (Implemented by MSMARCO Team) [ Robertson et al. '94] ReRanking Novmeber 1st, 2018 0.165 0.167
24 BERT Represenatation Yifan Qiao(1), Chenyan Xiong(2), Zhenghao Liu(3), Zhiyuan Lui(4) - Tsinghua University(1,3,4), Microsoft Research(2) [Qiao et al. '19] ReRanking February 19th,2019 0.015 0.043

Q&A Task(03/01/2018-Present)

Rank Model Submission Date Rouge-L Bleu-1
1 Human Performance April 23th, 2018 0.539 0.485
2 Selector+Combine-Content-Generator QA Model Shengjie Qian of Caiyun xiaoyi AI and BUPT March 19th, 2019 0.525 0.544
3 Masque Q&A Style NTT Media Intelligence Laboratories [Nishida et al. '19] January 3rd, 2019 0.522 0.437
4 Deep Cascade QA Ming Yan [Yan et al. '18] December 12th, 2018 0.520 0.546
5 VNET Baidu NLP [Wang et al. '18] November 8th, 2018 0.516 0.543
6 Selector+Combine-Content-Generator NL Model Shengjie Qian of Caiyun xiaoyi AI and BUPT March 11th, 2019 0.496 0.535
7 Masque NLGEN Style NTT Media Intelligence Laboratories [Nishida et al. '19] January 3rd, 2019 0.489 0.488
8 BERT+ Multi-Pointer-Generator Tongjun Li of the ColorfulClouds Tech and BUPT March 26th, 2019 0.484 0.516
9 SNET + CES2S Bo Shao of SYSU University July 24th, 2018 0.450 0.464
10 Extraction-net zlsh80826 October 20th, 2018 0.437 0.444
11 SNET JY Zhao August 30th, 2018 0.436 0.463
12 BIDAF+ELMo+SofterMax Wang Changbao November 16th, 2018 0.436 0.459
13 DNET QA Geeks August 1st, 2018 0.432 0.479
14 Reader-Writer Microsoft Business Applications Group AI Research September 16th, 2018 0.421 0.436
15 SNET+seq2seq Yihan Ni of the CAS Key Lab of Web Data Science and Technology, ICT, CAS June 1st, 2018 0.398 0.423
16 KIGN-QA Chenliang Li April 18th, 2019 0.422 0.401
17 lightNLP+BiDAF Enliple AI February 1st, 2019 0.298 0.156
18 BIDAF+seq2seq Yihan Ni of the CAS Key Lab of Web Data Science and Technology, ICT, CAS May 29th, 2018 0.276 0.288
19 BiDaF Baseline(Implemented By MSMARCO Team)
Allen Institute for AI & University of Washington [Seo et al. '16]
April 23th, 2018 0.240 0.106
20 TrioNLP + BiDAF Trio.AI of the CCNU September 23rd, 2018 0.205 0.232
21 BiDAF + LSTM Meefly January 15th,2019 0.153 0.120

Q&A + Natural Langauge Generation Task(03/01/2018-Present)

Rank Model Submission Date Rouge-L Bleu-1
1 Human Performance April 23th, 2018 0.632 0.530
2 Masque NLGEN Style NTT Media Intelligence Laboratories [Nishida et al. '19] January 3rd, 2019 0.496 0.501
3 BERT+ Multi-Pointer-Generator Tongjun Li of the ColorfulClouds Tech and BUPT March 26th,2019 0.487 0.465
4 BERT+ Multi-Pointer-Generator (Single) Tongjun Li of the ColorfulClouds Tech and BUPT March 19th,2019 0.484 0.459
5 Selector+Combine-Content-Generator NLGEN Model Shengjie Qian of Caiyun xiaoyi AI and BUPT March 11th, 2019 0.487 0.449
6 VNET Baidu NLP [Wang et al. '18] November 8th, 2018 0.484 0.468
7 SNET + CES2S Bo Shao of SYSU University July 24th, 2018 0.450 0.406
8 Reader-Writer Microsoft Business Applications Group AI Research September 16th, 2018 0.439 0.426
9 KIGN-QA Chenliang Li April 18th, 2019 0.441 0.462
10 ConZNet Samsung Research [Indurthi et al. '18] July 16th, 2018 0.421 0.386
11 Bayes QA Bin Bi of Alibaba NLP June 14st, 2018 0.411 0.435
12 SNET+seq2seq Yihan Ni of the CAS Key Lab of Web Data Science and Technology, ICT, CAS June 1st, 2018 0.401 0.375
13 BPG-NET Zhijie Sang of the Center for Intelligence Science and Technology Research(CIST) of the Beijing University of Posts and Telecommunications (BUPT) August 1st, 2018 0.382 0.347
14 Deep Cascade QA Ming Yan October 25th, 2018 0.351 0.374
15 BIDAF+seq2seq Yihan Ni of the CAS Key Lab of Web Data Science and Technology, ICT, CAS May 29th, 2018 0.322 0.283
16 Masque Q&A Style NTT Media Intelligence Laboratories [Nishida et al. '19] January 3rd, 2019 0.285 0.399
17 Selector+Combine-Content-Generator QA Model Shengjie Qian of Caiyun xiaoyi AI and BUPT March 11th, 2019 0.281 0.337
18 DNET QA Geeks August 1st, 2018 0.275 0.332
19 BIDAF+ELMo+SofterMax Wang Changbao November 16th, 2018 0.268 0.346
20 SNET JY Zhao May 29th, 2018 0.247 0.308
21 Extraction-net zlsh80826 August 14th, 2018 0.247 0.321
22 lightNLP+BiDAF Enliple AI February 1st, 2019 0.210 0.108
23 BiDaF Baseline(Implemented By MSMARCO Team)
Allen Institute for AI & University of Washington [Seo et al. '16]
April 23th, 2018 0.169 0.093
24 TrioNLP + BiDAF Trio.AI of the CCNU September 23rd, 2018 0.142 0.160
25 BiDAF + LSTM Meefly January 15th,2019 0.119 0.173

MS MARCO V1 Leaderboard(12/01/2016-03/31/2018)

Rank Model Submission Date Rouge-L Bleu-1
1 MARS
YUANFUDAO research NLP
March 26th, 2018 0.497 0.480
2 Human Performance
December 2016 0.470 0.460
3 V-Net
Baidu NLP [Wang et al '18]
February 15th, 2018 0.462 0.445
4 S-Net
Microsoft AI and Research [Tan et al. '17]
June 2017 0.452 0.438
5 R-Net
Microsoft AI and Research [Wei et al. '16]
May 2017 0.429 0.422
6 HieAttnNet
Akaitsuki
March 26th, 2018 0.423 0.448
7 BiAttentionFlow+
ShanghaiTech University GeekPie_HPC team
March 11th, 2018 0.415 0.381
8 ReasoNet
Microsoft AI and Research [Shen et al. '16]
April 28th, 2017 0.388 0.399
9 Prediction
Singapore Management University [Wang et al. '16]
March 2017 0.373 0.407
10 FastQA_Ext
DFKI German Research Center for AI [Weissenborn et al. '17]
March 2017 0.337 0.339
11 FastQA
DFKI German Research Center for AI [Weissenborn et al. '17]
March 2017 0.321 0.340
12 Flypaper Model
ZhengZhou University
March 14th, 2018 0.317 0.342
13 DCNMarcoNet
Flying Riddlers @ Carnegie Mellon University
March 31st, 2018 0.313 0.238
14 BiDaF Baseline for V2 (Implemented By MSMARCO Team)
Allen Institute for AI & University of Washington [Seo et al. '16]
April 23th, 2018 0.268 0.129
15 ReasoNet Baseline
Trained on SQuAd, Microsoft AI & Research [Shen et al. '16]
December 2016 0.192 0.148