It is easy to get started using the dataset and participate in the challenge using the resources that we have built for you.
Download the train and dev sets in order to train your model and iterate on it as you desire.
We have made the official evaluation script along with a sample output file on the dev set available for download as well so that you can evaluate your models. [Download the evaluation scripts] The evaluation script takes as inputs a reference and candidate output file. You can execute the evaluation script to evaluate your models as follows:
./run.sh <path to reference json file> <path to candidate json file>
To evaluate with on intermidiate task you will have to process your files to be in the same format as the novice task. There is a script in the github that does this for you.
Once you have built a model that meets your expectations on evaluation with the dev set, you can submit your test results to get official evaluation on the test set. To ensure the integrity of the official test results, we do not release the correct answers for test set to the public. To submit your model for official evaluation on the test set, follow the below steps:
To avoid "P-hacking" we discourage too many submissions from the same group in a short period of time. Because submissions don't require the final trained model we also retain the right to request a model to validate the results being submitted