United-Syn-Med

Abstract

United-MedSyn Dataset

Description

The United-MedSyn dataset is a specialized medical speech dataset designed to evaluate and improve Automatic Speech Recognition (ASR) systems within the healthcare domain. It comprises English medical speech recordings, with a particular focus on medical terminology and clinical conversations. The dataset is well-suited for various ASR tasks, including speech recognitiontranscription, and classification, facilitating the development of models tailored for medical contexts.

This dataset supports a broad range of applications, including medical documentation automation, transcription of doctor-patient conversations, and medical knowledge extraction from audio data.

Key Features

  • Language: English (en)
  • Task Categories:
    • Automatic Speech Recognition (ASR)
    • Audio Transcription
    • Speech Classification
  • Dataset Size: 100K < n < 1M
  • License: Creative Commons Attribution 4.0 International (CC BY 4.0)

Tags

  • asr-dataset
  • speech-to-text
  • audio-transcription
  • english
  • speech-recognition

Usage

This dataset can be used for a wide range of speech-related tasks, particularly within the healthcare domain:

  • Training and evaluating ASR models specific to medical terminology.
  • Improving speech-to-text accuracy for medical conversations.
  • Medical speech classification tasks.

Example Use Cases:

  1. ASR Model Training: Use the dataset to train models that convert spoken medical language into accurate text transcriptions.
  2. Medical Documentation: Automate the transcription of clinical conversations or medical dictations.
  3. Speech Classification: Classify various medical terms or conversation types from audio data.

License


The dataset is released under the CC BY-SA 4.0 License. You are free to:

  • Share — copy and redistribute the material in any medium or format.
  • Adapt — remix, transform, and build upon the material for any purpose, even commercially.

Under the following terms:

  • Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made.
  • ShareAlike — If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

Citation

If you use the United-MedSyn dataset in your research, please cite it as follows:

@dataset{united_medsyn,
  title={United-MedSyn: Medical Speech Dataset for ASR},
  author={United We Care},
  year={2024},
  license={CC BY-SA 4.0}
}

For any questions or issues regarding the dataset, please contact us at: [ayushi@unitedwecare.com].

Scroll to Top