My name in Oracle bone script:
Doctoral researcher in Computational Linguistics
Campus C7.4, Saarland University, 66123, Germany
dongqi.me [AT] gmail.com
The SciNews dataset is designed to facilitate the development and evaluation of models that generate scientific news reports from scholarly articles. This dataset aims to bridge the gap between complex scientific research and the general public by simplifying and summarizing academic content into accessible narratives. It supports tasks like text summarization, simplification, and the automated generation of scientific news, providing a valuable resource for enhancing public engagement with science and technology.
Data was collected from the Science X platform, an open-access hub for science, technology, and medical research news. Data extraction was performed using web scraping tools like Selenium and BeautifulSoup.
The dataset does not include additional annotations as it is a compilation of existing scientific papers and their corresponding news reports. The quality control included automated and human assessments to ensure the relevance and quality of the news narratives in relation to the original scientific papers.
Users of the SciNews dataset should be aware of its limitations and biases, particularly when developing models for scientific news generation. Efforts should be made to address potential biases and ensure that generated narratives accurately and fairly represent the original scientific content.
BibTeX:
@inproceedings{pu-etal-2024-scinews-scholarly,
title = "{S}ci{N}ews: From Scholarly Complexities to Public Narratives {--} a Dataset for Scientific News Report Generation",
author = "Pu, Dongqi and
Wang, Yifan and
Loy, Jia E. and
Demberg, Vera",
editor = "Calzolari, Nicoletta and
Kan, Min-Yen and
Hoste, Veronique and
Lenci, Alessandro and
Sakti, Sakriani and
Xue, Nianwen",
booktitle = "Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)",
month = may,
year = "2024",
address = "Torino, Italy",
publisher = "ELRA and ICCL",
url = "https://aclanthology.org/2024.lrec-main.1258",
pages = "14429--14444",
}
ACL:
Dongqi Pu, Yifan Wang, Jia E. Loy, and Vera Demberg. 2024. SciNews: From Scholarly Complexities to Public Narratives – a Dataset for Scientific News Report Generation. In Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 14429–14444, Torino, Italy. ELRA and ICCL.
For further inquiries or questions regarding the SciNews dataset, please contact the email address: dongqi.me@gmail.com