A gold standard methodology for evaluating accuracy in data-to-text systems C Thomson, E Reiter arXiv preprint arXiv:2011.03992, 2020 | 51 | 2020 |
Underreporting of errors in NLG output, and what to do about it E Van Miltenburg, MA Clinciu, O Dušek, D Gkatzia, S Inglis, L Leppänen, ... arXiv preprint arXiv:2108.01182, 2021 | 34 | 2021 |
Missing information, unresponsive authors, experimental flaws: The impossibility of assessing the reproducibility of previous human evaluations in NLP A Belz, C Thomson, E Reiter, G Abercrombie, JM Alonso-Moral, M Arvan, ... arXiv preprint arXiv:2305.01633, 2023 | 31 | 2023 |
SportSett: basketball-a robust and maintainable data-set for natural language generation C Thomson, E Reiter, S Sripada Proceedings of the Workshop on Intelligent Information Processing and …, 2020 | 27 | 2020 |
Generation challenges: Results of the accuracy evaluation shared task C Thomson, E Reiter arXiv preprint arXiv:2108.05644, 2021 | 19 | 2021 |
Gemv2: Multilingual nlg benchmarking in a single line of code S Gehrmann, A Bhattacharjee, A Mahendiran, A Wang, A Papangelis, ... arXiv preprint arXiv:2206.11249, 2022 | 15 | 2022 |
The 2024 repronlp shared task on reproducibility of evaluations in nlp: Overview and results A Belz, C Thomson Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems …, 2024 | 13 | 2024 |
Non-repeatable experiments and non-reproducible results: The reproducibility crisis in human evaluation in NLP A Belz, C Thomson, E Reiter, S Mille Findings of the Association for Computational Linguistics: ACL 2023, 3676-3687, 2023 | 13 | 2023 |
Evaluating factual accuracy in complex data-to-text C Thomson, E Reiter, B Sundararajan Computer Speech & Language 80, 101482, 2023 | 13 | 2023 |
Shared task on evaluating accuracy E Reiter, CA Thomson | 11 | 2020 |
The 2023 webnlg shared task on low resource languages overview and evaluation results (webnlg 2023) L Cripwell, A Belz, C Gardent, A Gatt, C Borg, M Borg, J Judge, M Lorandi, ... Proceedings of the Workshop on Multimodal, Multilingual Natural Language …, 2023 | 7 | 2023 |
Barriers and enabling factors for error analysis in NLG research E Van Miltenburg, M Clinciu, O Dušek, D Gkatzia, S Inglis, L Leppänen, ... Northern European Journal of Language Technology 9 (1), 2023 | 7 | 2023 |
Comprehension driven document planning in natural language generation systems C Thomson, E Reiter, S Sripada Proceedings of The 11th International Natural Language Generation Conference, 2018 | 5 | 2018 |
Common Flaws in Running Human Evaluation Experiments in NLP C Thomson, E Reiter, A Belz Computational Linguistics, 1-11, 2024 | 4 | 2024 |
Studying the impact of filling information gaps on the output quality of neural data-to-text CA Thomson, Z Zhao, SG Sripada | 4 | 2020 |
Enhancing factualness and controllability of Data-to-Text Generation via data Views and constraints C Thomson, C Rebuffel, E Reiter, L Soulier, S Sripada, P Gallinari Proceedings of the 16th international natural language generation conference …, 2023 | 1 | 2023 |
The accuracy evaluation shared task as a retrospective reproduction study C Thomson, E Reiter Proceedings of the 15th International Conference on Natural Language …, 2022 | 1 | 2022 |
Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems (HumEval)@ LREC-COLING 2024 S Balloccu, A Belz, R Huidrom, E Reiter, J Sedoc, C Thomson Proceedings of the Fourth Workshop on Human Evaluation of NLP Systems …, 2024 | | 2024 |
Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems A Belz, M Popović, E Reiter, C Thomson, J Sedoc Proceedings of the 3rd Workshop on Human Evaluation of NLP Systems, 2023 | | 2023 |