Dongqi Pu

My name in Oracle bone script:

Doctoral researcher in Computational Linguistics

Campus C7.4, Saarland University, 66123, Germany

dongqi.me [AT] gmail.com

SciNews: From Scholarly Complexities to Public Narratives -- A Dataset for Scientific News Report Generation

The SciNews dataset is designed to facilitate the development and evaluation of models that generate scientific news reports from scholarly articles. This dataset aims to bridge the gap between complex scientific research and the general public by simplifying and summarizing academic content into accessible narratives. It supports tasks like text summarization, simplification, and the automated generation of scientific news, providing a valuable resource for enhancing public engagement with science and technology.

SciNews

Dataset Details

Dataset Description

Dataset Sources

Dataset Creation

Data Collection and Processing

Data was collected from the Science X platform, an open-access hub for science, technology, and medical research news. Data extraction was performed using web scraping tools like Selenium and BeautifulSoup.

Annotations

The dataset does not include additional annotations as it is a compilation of existing scientific papers and their corresponding news reports. The quality control included automated and human assessments to ensure the relevance and quality of the news narratives in relation to the original scientific papers.

Recommendations

Users of the SciNews dataset should be aware of its limitations and biases, particularly when developing models for scientific news generation. Efforts should be made to address potential biases and ensure that generated narratives accurately and fairly represent the original scientific content.

Citation

BibTeX:

@inproceedings{pu2024scinews,
  title={SciNews: From Scholarly Complexities to Public Narratives – A Dataset for Scientific News Report Generation},
  author={Pu, Dongqi and Wang, Yifan and Loy, Jia and Demberg, Vera},
  booktitle={Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation.},
  year={2024}
}

APA:

Pu, D., Wang, Y., Loy, J., & Demberg, V. (2024). SciNews: From Scholarly Complexities to Public Narratives – A Dataset for Scientific News Report Generation. The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation.

Dataset Card Authors

This dataset card was created based on the paper by Dongqi Pu, Yifan Wang, Jia Loy, Vera Demberg from Saarland University, Germany.

Dataset Card Contact

For further inquiries or questions regarding the SciNews dataset, please contact the email address: dongqi.me@gmail.com