My name in Oracle bone script:
Doctoral researcher in Computational Linguistics
Campus C7.4, Saarland University, 66123, Germany
dongqi.me [AT] gmail.com
The SciNews dataset is designed to facilitate the development and evaluation of models that generate scientific news reports from scholarly articles. This dataset aims to bridge the gap between complex scientific research and the general public by simplifying and summarizing academic content into accessible narratives. It supports tasks like text summarization, simplification, and the automated generation of scientific news, providing a valuable resource for enhancing public engagement with science and technology.
Data was collected from the Science X platform, an open-access hub for science, technology, and medical research news. Data extraction was performed using web scraping tools like Selenium and BeautifulSoup.
The dataset does not include additional annotations as it is a compilation of existing scientific papers and their corresponding news reports. The quality control included automated and human assessments to ensure the relevance and quality of the news narratives in relation to the original scientific papers.
Users of the SciNews dataset should be aware of its limitations and biases, particularly when developing models for scientific news generation. Efforts should be made to address potential biases and ensure that generated narratives accurately and fairly represent the original scientific content.
BibTeX:
@inproceedings{pu2024scinews,
title={SciNews: From Scholarly Complexities to Public Narratives – A Dataset for Scientific News Report Generation},
author={Pu, Dongqi and Wang, Yifan and Loy, Jia and Demberg, Vera},
booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation.},
year={2024}
}
APA:
Pu, D., Wang, Y., Loy, J., & Demberg, V. (2024). SciNews: From Scholarly Complexities to Public Narratives – A Dataset for Scientific News Report Generation. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation.
This dataset card was created based on the paper by Dongqi Pu, Yifan Wang, Jia Loy, Vera Demberg from Saarland University, Germany.
For further inquiries or questions regarding the SciNews dataset, please contact the email address: dongqi.me@gmail.com