When Audioburst's AI-powered system processes new audio, it collects and analyzes a standard set of metadata for each minute of content. This metadata allows us to surface just the right news story, podcast clip, or playlist to listeners.
Below is a list of all of the audio features that we identify for all radio content that is processed by our system.
(Information for podcast content can be found here)
Source - This is the original place where the audio was aired (radio) or released (podcasts). For each source, we collect the following details:
- Location - The physical location of the source
- Type - Whether the source is radio or podcast based
- Show Name (if applicable)
- Main Category - The main subject covered in the source's content
- Discussion topics - Whether the source covers only one topic or many topics
- Explicit - Whether or not the content contains strong language or mature themes
Air Time - A timestamp for when the audio originally aired
Title - Each burst has a descriptive sentence, which we call the 'title.' This is usually a title of an existing web article that is related to the audio, but it can also be a combination of the burst's core entities or keywords that could be used to surface the burst in our library
Transcript - The full, automatically-generated transcription of the audio
Duration - The length of the audio clip
Per-word Timing - Each word in the transcript is tagged with a start time as well as the word's duration, and our confidence in the system's transcription of the word
Audio Cues - We time and extract non-spoken information, such as silence, music, hesitation, applause, laughter, or speaker changes
Category - The main topic covered in the audio. Our classifiers assign a specific value coinciding with our prescribed list of categories, such as U.S. News, NBA, entertainment, music, or advertisements
Keywords - We extract relevant keywords through Natural Language Processing (NLP) technology. Keywords are short phrases from the audio that have a meaningful impact on the content of the clip. For each keyword we collect the word itself, it's relevancy to the content, and the sentiment/emotion conveyed by the word
Entities - We also extract entities from the audio using NLP. An entity is a word or phrase that represents a person, location, company, job title, or another meaningful attribute in the content. For each entity, we collect the word/phrase itself, as well as its relevance to the content, and the sentiment it conveys to the listener
Location - The relationship between the audio and a specific location. For instance, the system will note if a weather report is for New York, if the new mayor mentioned in the audio is from Boston, or if the NBA game discussed is between San Francisco and Los Angeles, etc.
Speaker Information - Data related to the speakers in a segment of audio content, such as gender, the number of speakers in the clip, the frequency of speaker changes, and speaker Identification, when possible
Is this news? - Our system automatically determines whether or not the audio is "newsy," i.e., whether or not the audio's content matter is also being reported on in current news bulletins and stories
Objectivity - We consider bias by determining whether the audio content presents an objective (fact-based) or subjective (opinion-based) viewpoint
Tempo - The pace of the content, based on the number of words spoken per minute