A hands-on collection of short, composable examples covering dialog generation, orchestration, evaluation, analysis, and interpretability in SDialog. Happy experimenting!
Prerequisites
Install the library:
pip install sdialog
Then in your scripts below make sure to set your default LLM backend, model, and/or parameters, for example:
import sdialog
sdialog.config.llm("openai:gpt-4.1", temperature=0.7)
(You may substitute any supported backend string: huggingface:..., ollama:..., amazon:..., google:....)
Generation
Reproducibility & Seeding
SDialog allows you to control randomness to make generations repeatable.
All generation-based components accept a seed argument, and will be effective as long as the selected underlying LLM backend supports it.
Some examples:
Agent-based dialogue (role-play) reproducibility:
# Same seed -> identical dialogue turns (given fixed model + params)
dialog_a = alice.dialog_with(mentor, max_turns=6, seed=12345)
dialog_b = alice.dialog_with(mentor, max_turns=6, seed=12345)
assert [t.text for t in dialog_a.turns] == [t.text for t in dialog_b.turns]
# Different seed -> likely different variation (if no seed, one is chosen randomly)
dialog_c = alice.dialog_with(mentor, max_turns=6)
DialogGenerator seeding:
from sdialog.generators import DialogGenerator
gen = DialogGenerator("Short friendly greeting between two speakers")
d1 = gen.generate(seed=42)
d2 = gen.generate(seed=42)
# d1.turns == d2.turns (stable under same backend + params)
Persona / Context attribute generation seeding:
from sdialog.personas import Doctor
from sdialog.generators import PersonaGenerator
pg = PersonaGenerator(Doctor(specialty="Cardiology"))
pg.set(years_of_experience="{5-10}")
p1 = pg.generate(seed=99)
p2 = pg.generate(seed=99) # deterministic attribute choices
Seed value, as long as model and parameters are always logged and saved as part of the metadata of the generated objects (e.g., dialog.seed, persona.seed), which is very useful when they are exported to JSON.
Basic Persona-to-Persona Dialogue
Let’s start with something fun and straightforward—creating a simple dialogue between two personas! We’ll use Agent to bring these characters to life and watch them have a conversation.
from sdialog.personas import Persona
from sdialog.agents import Agent
alice = Agent(persona=Persona(name="Alice", role="curious student"), first_utterance="Hi!")
mentor = Agent(persona=Persona(name="Mentor", role="helpful tutor"))
dialog = alice.dialog_with(mentor, max_turns=6)
dialog.print()
Individual agents can be served and exposed as a OpenAI compatible API endpoint with the serve() method (e.g. mentor.serve(port=1333)), see here for more details.
Few-Shot Learning with Example Dialogs
Now let’s explore one of SDialog’s most powerful features! We can guide our dialogues by providing examples that show the system what style, structure, or format we want. This technique, called few-shot learning, works by supplying example_dialogs to generation components. These exemplar dialogs are injected into the system prompt to steer tone, task format, and conversation flow.
Agent Role-Play with Exemplars
Here’s how we can influence the conversation style by providing example dialogues:
from sdialog.personas import Persona
from sdialog.agents import Agent
# The example dialogues, e.g. real dialogues or handcrafted samples
my_example_dialogs = [...]
student = Agent(persona=Persona(name="Learner", role="math student"))
tutor = Agent(persona=Persona(name="Guide", role="math tutor"))
# The exemplar style (concise, explanatory) biases responses
fewshot_dialog = student.dialog_with(tutor, example_dialogs=my_example_dialogs)
fewshot_dialog.print()
DialogGenerator with Exemplars
Let’s see how we can use example dialogues to guide the generation of entirely new conversations:
from sdialog.generators import DialogGenerator
# We can use the from_file() function to load all the dialogues in a folder
my_example_dialogs = Dialog.from_file("path/to/reference_dialogs/")
gen = DialogGenerator(
"Provide a short educational exchange about black holes.",
example_dialogs=my_example_dialogs # alternatively, can be also passed in `generate()`
)
generated = gen.generate()
generated.print()
PersonaDialogGenerator Few-Shot
Now let’s combine personas with few-shot learning to create highly targeted conversations:
from sdialog.personas import Persona
from sdialog.generators import PersonaDialogGenerator
# Alternatively, we can use the type argument to only load specific file types
my_example_dialogs = Dialog.from_file("path/to/reference_dialogs/",
type="csv")
p1 = Persona(name="Coach", role="productivity mentor")
p2 = Persona(name="Client", role="knowledge worker")
pd_gen = PersonaDialogGenerator(p1, p2,
dialogue_details="Exchange exactly three tips about deep work.",
example_dialogs=my_example_dialogs)
fewshot_pd = pd_gen.generate()
fewshot_pd.print()
Multi-Agent Orchestration (Reflex + Length Control)
Ready to add some intelligence to our conversations? Let’s explore how to use sdialog.orchestrators to dynamically steer dialogue turns. Think of orchestrators as conversation directors—they watch what’s happening and can give instructions to guide the flow. The best part? We can combine multiple orchestrators using the pipe operator!
from sdialog.orchestrators import SimpleReflexOrchestrator, LengthOrchestrator
from sdialog.personas import Persona
from sdialog.agents import Agent
# Reflex: trigger extra instruction if keyword appears
reflex = SimpleReflexOrchestrator(
condition=lambda utt: "deadline" in utt.lower(),
instruction="Acknowledge the deadline and ask for specifics.")
# Encourage at least 8, wrap by 12
length_ctrl = LengthOrchestrator(min=8, max=12)
planner = Agent(persona=Persona(name="Planner", role="project manager"))
dev = Agent(persona=Persona(name="Dev", role="engineer"), first_utterance="Any updates?")
planner = planner | reflex | length_ctrl
dialog = dev.dialog_with(planner)
dialog.print(orchestration=True)
Advanced Orchestrator: LLM Judges + Persistent Instructions
Now let’s build something really sophisticated! In this advanced example, we’ll create a persistent orchestrator that demonstrates the full power of SDialog’s orchestration capabilities. Our orchestrator will:
Inspect the entire accumulated dialogue (not just the last utterance)
Use LLM judges to detect different conversation contexts
Emit different persistent instructions for distinct detected conditions
Avoid re-emitting the same instruction thanks to internal state management
We’ll use two LLMJudgeYesNo judges (LLM-based yes/no classifiers) to detect interesting patterns in our ongoing dialogue:
expertise_judge- fires when the other speaker has likely demonstrated professional / domain expertise (mentions of role, advanced methods, implementation specifics, etc.).sensitive_context_judge- fires when recent dialogue turns suggest emotionally sensitive or celebratory life events that warrant tone adaptation.
Both judges are only invoked once (each) thanks to internal flags. After an instruction is emitted it persists automatically (the orchestrator returns None thereafter), this helps control latency / cost.
from sdialog.orchestrators import BasePersistentOrchestrator
from sdialog.evaluation import LLMJudgeYesNo
class ConversationContextOrchestrator(BasePersistentOrchestrator):
"""Advanced persistent orchestrator using LLM judges.
Emits at most two persistent instructions:
1. Domain expertise adaptation (peer-level engagement)
2. Emotional / situational sensitivity adaptation (empathy or celebration)
"""
def __init__(self, model: str | None = None):
super().__init__()
self.context_set = False
self.expertise_activated = False
# Judge templates kept short so they remain inexpensive; they receive the current dialog.
self.expertise_judge = LLMJudgeYesNo(
"Has the speaker demonstrated professional domain expertise "
"(e.g., explicitly stating their job, years of experience, or "
"discussing implementation/technical methodology) in the dialogue so far?",
model=model
)
self.sensitive_context_judge = LLMJudgeYesNo(
"Do the recent turns indicate either "
"(a) a sensitive emotional situation (e.g., loss, illness, personal hardship) or "
"(b) clearly positive celebratory news (promotion, graduation, birth)?",
model=model
)
def instruct(self, dialog, utterance):
# Keep only the other speaker's turns in the dialog
other_speaker_name = [speaker for speaker in dialog.get_speakers()
if speaker != self.agent.name]
dialog = dialog.filter(other_speaker_name)
# If no utterances from the other speaker, nothing to do
if len(dialog) <= 0:
return None
# 1) Domain expertise detection
if not self.expertise_activated:
result = self.expertise_judge.judge(dialog)
if result.positive:
self.expertise_activated = True
return (
"The interlocutor has demonstrated domain expertise. "
"From now on: engage at a peer level, skip basic explanations, "
"focus on advanced concepts, and invite their professional insights."
)
# 2) Emotional / situational sensitivity.
if not self.context_set:
result_ctx = self.sensitive_context_judge.judge(dialog)
if result_ctx.positive:
self.context_set = True
return (
"Recent dialogue content suggests notable emotional "
"context (sensitive or celebratory). Acknowledge it explicitly, "
"mirror appropriate tone (empathy or enthusiasm), and balance emotional "
"support with any informational guidance."
)
return None
def reset(self):
super().reset()
self.context_set = False
self.expertise_activated = False
# Example usage where our orchestrator is instantiated with OpenAI's GPT-4o-mini
ctx_orch = ConversationContextOrchestrator(model="openai:gpt-4o-mini")
agent = agent | ctx_orch
Explanation: The first time a condition is met we return the instruction; SDialog stores it as a persistent system instruction. Subsequent calls return None because once injected the instruction remains in effect automatically.
Attribute Generation (Personas & Contexts)
Let’s dive into creating diverse characters and settings! We’ll see how to use PersonaGenerator and ContextGenerator with a hybrid approach that combines rules with LLM intelligence for flexible content generation.
from sdialog.personas import Teacher, Student
from sdialog.generators import PersonaGenerator, ContextGenerator
from sdialog import Context
# Teacher and Student are real persona classes
teacher_gen = PersonaGenerator(Teacher(subject="mathematics"))
student_gen = PersonaGenerator(Student(interests="algebra"))
# Apply simple attribute rules (random range & list choices)
teacher_gen.set(years_experience="{5-15}", politeness=["polite", "neutral", "strict"])
student_gen.set(age="{15-20}")
teacher = teacher_gen.generate()
student = student_gen.generate()
ctx_base = Context(location="classroom")
ctx_gen = ContextGenerator(ctx_base)
ctx_gen.set(topics=["algebra", "problem solving", "study tips"],
goals="{llm:State one succinct learning goal}")
context = ctx_gen.generate()
teacher.print(); student.print(); context.print()
Paraphrasing an Existing Dialog
Sometimes we have a good dialogue but want to refine or adapt it for different audiences or styles. Let’s see how to apply Paraphraser to rephrase conversations, with the option to target specific speakers only.
from sdialog.generators import Paraphraser
# Assume `dialog` produced earlier
paraphraser = Paraphraser(extra_instructions="Lightly simplify wording", target_speaker="Bob")
dialog_paraphrased = paraphraser(dialog)
dialog_paraphrased.print()
Evaluation and Analysis
Linguistic Feature Metrics
Let’s start measuring the quality of our dialogues! We’ll compute readability and style indicators using LinguisticFeatureScore to understand how our generated conversations compare to natural human speech.
from sdialog.evaluation import LinguisticFeatureScore
feat_all = LinguisticFeatureScore() # all features
hes_rate = LinguisticFeatureScore(feature="hesitation-rate")
print(feat_all(dialog)) # dict of metrics
print(hes_rate(dialog)) # single float
Flow-Based Scores (Perplexity & Likelihood)
Now let’s evaluate how well our dialogues follow natural conversation patterns! We can use existing dialogues as a reference to assess structural fit and flow.
from sdialog.evaluation import DialogFlowPPL, DialogFlowScore
reference_dialogs = [...] # normally a larger corpus
flow_ppl = DialogFlowPPL(reference_dialogs)
flow_score = DialogFlowScore(reference_dialogs)
print("Flow PPL:", flow_ppl(candidate_dialog))
print("Flow Score:", flow_score(candidate_dialog))
Embedding + Centroid Similarity
Let’s explore semantic similarity! We can compare our candidate dialogues to a reference centroid using embeddings to understand how semantically close they are to our target conversations.
from sdialog.evaluation import SentenceTransformerDialogEmbedder, ReferenceCentroidEmbeddingEvaluator
embedder = SentenceTransformerDialogEmbedder(model_name="sentence-transformers/LaBSE")
centroid_eval = ReferenceCentroidEmbeddingEvaluator(embedder, reference_dialogs)
print("Centroid similarity:", centroid_eval([dialog]))
LLM Judges: Realism + Persona Adherence
Now let’s bring in the big guns—LLM judges! These are sophisticated evaluators that can assess dialogue realism and check whether characters stay true to their personas throughout the conversation.
from sdialog.evaluation import LLMJudgeRealDialogLikertScore, LLMJudgePersonaAttributes
from sdialog.personas import Persona
realism_judge = LLMJudgeRealDialogLikertScore(reason=True)
persona_ref = Persona(name="Mentor", role="helpful tutor")
persona_judge = LLMJudgePersonaAttributes(persona=persona_ref, speaker="Mentor", reason=True)
realism_result = realism_judge.judge(dialog)
persona_result = persona_judge.judge(dialog)
print("Realism score:", realism_result.score, realism_result.reason)
print("Persona match:", persona_result.positive, persona_result.reason)
Custom LLM Judges: Document Relevance
Ready to build your own evaluation logic? In this example, we’ll explore how to create custom LLM judges that can evaluate document relevance in the context of a dialogue. This is particularly useful when you want to assess whether retrieved information or generated content actually matches what users are discussing.
Let’s start by setting up our scenario with a sample dialogue and two documents (one relevant, one not):
from sdialog import Dialog
# Example dialogue where the user is looking for a laptop
dialog = Dialog.from_str("""
AI: Hello! How can I assist you today?
Robert: Hi! I'm looking for a new laptop. Can you help me find one?
AI: Of course! What are your main requirements for the laptop?
Robert: I need it for gaming and graphic design, so it should have a powerful GPU and a high-resolution display.
AI: Got it. Do you have a preferred brand or budget in mind?""")
# Document with relevant information matching user needs
good_document = """
The latest gaming laptops come with powerful GPUs and high-resolution displays.
They are designed to handle demanding tasks like gaming and graphic design with ease.
Many models also offer customizable options to fit your specific needs and budget.
"""
# Document with irrelevant information not matching user needs (e.g., about cooking)
bad_document = """
Cooking is an essential skill that everyone should learn.
It allows you to prepare healthy meals at home and can be a fun hobby.
Many people enjoy experimenting with new recipes and ingredients.
"""
Example 1: Yes/No relevance judgment with reasoning
Let’s start with a binary approach! We’ll create a judge using LLMJudgeYesNo that determines whether a document is relevant to what the user was discussing in the dialogue. The cool thing about SDialog is that we can add any placeholder in our judge template and then pass their values when calling judge(). In this example, we’ll use {{ document }} as a placeholder to pass a document to be evaluated (you’re free to add as many placeholders with whatever names as needed).
By setting reason=True, we enable reasoning, which provides detailed explanations along with the binary verdict.
from sdialog.evaluation import LLMJudgeYesNo
# Let's create our custom yes/no judge
doc_rel = LLMJudgeYesNo(
"Does this document relate to what the user was looking for?\n\n"
"Document:\n{{ document }}",
reason=True # Enable explanations
)
# Let's use our judge to evaluate our example documents
good_result = doc_rel.judge(dialog, document=good_document)
bad_result = doc_rel.judge(dialog, document=bad_document)
print("Good document verdict:", good_result.positive)
print("Good document reason:", good_result.reason)
print("---")
print("Bad document verdict:", bad_result.positive)
print("Bad document reason:", bad_result.reason)
Output:
Good document verdict: True
Good document reason: The document directly addresses the user's stated needs - a laptop with a powerful GPU and high-resolution display for gaming and graphic design - as expressed in the dialogue.
---
Bad document verdict: False
Bad document reason: The document discusses cooking, while the user is looking for information about laptops. These topics are unrelated.
Example 2: Likert-style relevance score with reasoning
Now let’s explore a more nuanced approach! Instead of binary yes/no decisions, we can use numerical scores to evaluate relevance on a scale using LLMJudgeScore. In this example, we’ll implement a 1-5 Likert scale judgment that provides a more granular assessment of document relevance.
from sdialog.evaluation import LLMJudgeScore
# Let's create now our custom score judge
doc_rel = LLMJudgeScore(
"From 1 to 5, how closely does the following document match user needs?\n\n"
"Document:\n{{ document }}",
reason=True
)
# Again, let's use our judge to evaluate our example documents
good_result = doc_rel.judge(dialog, document=good_document)
bad_result = doc_rel.judge(dialog, document=bad_document)
print("Good document score:", good_result.score)
print("Good document reason:", good_result.reason)
print("---")
print("Bad document score:", bad_result.score)
print("Bad document reason:", bad_result.reason)
Output:
Good document score: 5
Good document reason: The document directly addresses the user's needs as expressed in the dialogue. Robert specifically states he needs a laptop for gaming and graphic design with a powerful GPU and high-resolution display, and the document highlights these exact features in gaming laptops. It's a perfect match for the user's stated requirements.
---
Bad document score: 1
Bad document reason: The provided document discusses cooking, while the user dialogue is about finding a laptop. There is absolutely no overlap in topic or user need fulfillment; therefore, the match is extremely poor.
Sometimes we only need the numerical score without the detailed explanation. In such cases, we can call the judge object directly to return just the score value:
print("Good document score:", doc_rel(dialog, document=good_document))
print("Bad document score:", doc_rel(dialog, document=bad_document))
Dataset-Level Comparison (Frequency + Mean)
When working with multiple datasets or comparing different models, we need comprehensive evaluation tools! Let’s see how we can aggregate metrics using DatasetComparator to get a bird’s-eye view of our results.
from sdialog.evaluation import FrequencyEvaluator, MeanEvaluator, DatasetComparator, LLMJudgeRealDialog, LinguisticFeatureScore
judge_real = LLMJudgeRealDialog()
flesch = LinguisticFeatureScore(feature="flesch-reading-ease")
comparator = DatasetComparator([
FrequencyEvaluator(judge_real, name="Realistic rate"),
MeanEvaluator(flesch, name="Avg Flesch")
])
results = comparator({"modelA": [dialog], "modelB": [dialog_paraphrased]})
comparator.plot(show=False) # generate plots silently
Distribution Divergence (KDE / Frechet)
Let’s get statistical! To compare score distributions between different dialogue sets, we can use sophisticated statistical evaluators that measure how different our generated content is from reference material.
from sdialog.evaluation import KDEDistanceEvaluator, FrechetDistanceEvaluator
turn_len_score = LinguisticFeatureScore(feature="mean-turn-length")
kde_eval = KDEDistanceEvaluator(dialog_score=turn_len_score, reference_dialogues=[dialog])
frechet_eval = FrechetDistanceEvaluator(dialog_score=turn_len_score, reference_dialogues=[dialog])
print("KDE divergence:", kde_eval([dialog_paraphrased]))
print("Frechet distance:", frechet_eval([dialog_paraphrased]))
Interpretability
Capturing Activations with an Inspector
Ready to peek inside the mind of our language model? Let’s start by exploring how to attach Inspector to an agent to record token-level activations. This gives us a window into what the model is “thinking” as it generates responses!
from sdialog.interpretability import Inspector
from sdialog.agents import Agent
from sdialog.personas import Persona
thinker = Agent(persona=Persona(name="Analyzer", role="critic"))
insp = Inspector(target='model.layers.2.post_attention_layernorm')
thinker = thinker | insp
thinker("Summarize the project goals in one sentence.")
thinker("Refine it further.")
print("Responses captured:", len(insp))
first_token_act = insp[-1][0].act # last response, first token activation
Steering with a Direction Vector
Now for the exciting part—actually influencing the model’s behavior! We can use DirectionSteerer to nudge (add) or ablate (subtract) semantic directions in the model’s internal representations.
import torch
from sdialog.interpretability import DirectionSteerer
# Assume inspector already attached as `insp`
direction = torch.randn(first_token_act.shape[-1]) # dummy direction
steer = DirectionSteerer(direction)
# Push activations along direction
insp = insp + steer
thinker("Provide a concise optimistic remark.")
# Remove (ablate) the same direction
insp = insp - steer
thinker("Provide a concise neutral remark.")
Finding Injected Instructions
Let’s trace what’s happening under the hood! We can inspect dynamic system instructions that were added during a dialogue to understand how orchestrators influenced the conversation.
instruct_events = insp.find_instructs(verbose=False)
for e in instruct_events:
print(e["index"], e["content"])
Custom Orchestrator + Interpretability
Let’s bring it all together! We’ll combine concepts by defining a custom orchestrator to encourage elaboration, while observing its effects with an inspector. This demonstrates how orchestration and interpretability work hand-in-hand.
from sdialog.orchestrators import BaseOrchestrator
class EncourageDetailOrchestrator(BaseOrchestrator):
def instruct(self, dialog, utterance):
if utterance and len(utterance.split()) < 5:
return "Add one more concrete detail in your reply."
return None
detail_orch = EncourageDetailOrchestrator()
explainer_agent = Agent(persona=Persona(name="Explainer", role="assistant"))
inspector_detail = Inspector(target='model.layers.1.post_attention_layernorm')
verbose_agent = explainer_agent | detail_orch | inspector_detail
verbose_dialog = verbose_agent.dialog_with(thinker, max_turns=6)
verbose_dialog.print(orchestration=True)
Multiple Inspectors (Layer Comparison)
For our final trick, let’s attach multiple inspectors to compare what’s happening at different depths in the model! This can reveal how information flows and transforms through the neural network layers.
insp_early = Inspector(target='model.layers.0.post_attention_layernorm')
insp_late = Inspector(target='model.layers.10.post_attention_layernorm')
probe_agent = Agent(persona=Persona(name="Probe", role="analyzer")) | insp_early | insp_late
probe_agent("Explain the purpose of orchestration briefly.")
print(len(insp_early[-1]), len(insp_late[-1])) # token counts captured
Audio Generation
Quick Audio Generation
Let’s start with the simplest way to generate audio from your dialogues! SDialog provides convenient one-function audio generation that handles everything automatically.
from sdialog.audio.pipeline import to_audio
from sdialog import Dialog
# Load an existing dialogue
dialog = Dialog.from_file("path/to/your/dialog.json")
# Generate complete audio in one call
audio_dialog = to_audio(
dialog,
perform_room_acoustics=True,
audio_file_format="mp3" # or "wav", "flac"
)
# Access generated files
print(f"Combined audio: {audio_dialog.audio_step_1_filepath}")
print(f"Timeline audio: {audio_dialog.audio_step_2_filepath}")
print(f"Room acoustics: {audio_dialog.audio_step_3_filepaths}")
Using Dialog’s built-in method:
# Convert dialog directly to audio using the built-in method
audio_dialog = dialog.to_audio(
perform_room_acoustics=True
)
Advanced Audio Pipeline
For more control over the audio generation process, let’s use the full AudioPipeline with custom configurations!
Complete Audio Pipeline with Room Acoustics:
from sdialog.audio import AudioDialog, KokoroTTS, HuggingfaceVoiceDatabase
from sdialog.audio.pipeline import AudioPipeline
from sdialog.audio.room import DirectivityType
from sdialog.audio.utils import SourceVolume, SourceType, Role
from sdialog.audio.jsalt import MedicalRoomGenerator, RoomRole
from sdialog.personas import Persona
from sdialog.agents import Agent
# 1. Create a base text dialogue
doctor = Persona(name="Dr. Smith", role="doctor", age=40, gender="male", language="english")
patient = Persona(name="John", role="patient", age=45, gender="male", language="english")
doctor_agent = Agent(persona=doctor)
patient_agent = Agent(persona=patient, first_utterance="Hello doctor, I have chest pain.")
dialog = patient_agent.dialog_with(doctor_agent, max_turns=6)
# 2. Convert to audio dialogue
audio_dialog = AudioDialog.from_dialog(dialog)
# 3. Configure TTS engine and voice database
tts_engine = KokoroTTS(lang_code="a") # American English
voice_database = HuggingfaceVoiceDatabase("sdialog/voices-kokoro")
# 4. Setup audio pipeline
audio_pipeline = AudioPipeline(
voice_database=voice_database,
tts_engine=tts_engine,
dir_audio="./audio_outputs"
)
# 5. Generate a medical examination room
room = MedicalRoomGenerator().generate(args={"room_type": RoomRole.EXAMINATION})
# 6. Position speakers around furniture in the room
room.place_speaker_around_furniture(
speaker_name=Role.SPEAKER_1,
furniture_name="desk",
max_distance=1.0
)
room.place_speaker_around_furniture(
speaker_name=Role.SPEAKER_2,
furniture_name="desk",
max_distance=1.0
)
# 7. Set microphone directivity
room.set_directivity(direction=DirectivityType.OMNIDIRECTIONAL)
# 8. Run the complete audio pipeline
audio_dialog = audio_pipeline.inference(
audio_dialog,
environment={
"room": room,
"background_effect": "white_noise",
"foreground_effect": "ac_noise_minimal",
"source_volumes": {
SourceType.ROOM: SourceVolume.HIGH,
SourceType.BACKGROUND: SourceVolume.VERY_LOW
},
"kwargs_pyroom": {
"ray_tracing": True,
"air_absorption": True
}
},
perform_room_acoustics=True,
dialog_dir_name="medical_consultation",
room_name="examination_room"
)
# 9. Access the generated audio files
print(f"Combined utterances: {audio_dialog.audio_step_1_filepath}")
print(f"DScaper timeline: {audio_dialog.audio_step_2_filepath}")
print(f"Room acoustics simulation: {audio_dialog.audio_step_3_filepaths}")
Room Generation and Configuration
SDialog provides powerful room generation capabilities for creating realistic acoustic environments. Let’s explore different room types and configurations!
Medical Room Generator - Create specialized medical environments:
from sdialog.audio.jsalt import MedicalRoomGenerator, RoomRole
# Generate different types of medical rooms
generator = MedicalRoomGenerator()
# Various medical room types
consultation_room = generator.generate({"room_type": RoomRole.CONSULTATION})
examination_room = generator.generate({"room_type": RoomRole.EXAMINATION})
# ... other room types available: TREATMENT, PATIENT_ROOM, SURGERY, etc.
# Get room properties
print(f"Room area: {examination_room.get_square_meters():.1f} m²")
print(f"Room volume: {examination_room.get_volume():.1f} m³")
Basic Room Generator - Create simple rectangular rooms:
from sdialog.audio.room_generator import BasicRoomGenerator
# Generate rooms with different sizes
generator = BasicRoomGenerator(seed=123) # For reproducible results
small_room = generator.generate({"room_size": 8}) # 8 m²
large_room = generator.generate({"room_size": 20}) # 20 m²
print(f"Small room: {small_room.get_square_meters():.1f} m²")
print(f"Large room: {large_room.get_square_meters():.1f} m²")
Custom Room Generator - Create your own specialized room types:
from sdialog.audio.room import Room
from sdialog.audio.utils import Furniture, RGBAColor
from sdialog.audio.room_generator import RoomGenerator, Dimensions3D
import random
import time
class WarehouseRoomGenerator(RoomGenerator):
def __init__(self):
super().__init__()
self.ROOM_SIZES = {
"big_warehouse": ([1000, 2500], 0.47, "big_warehouse"),
"small_warehouse": ([100, 200, 300], 0.75, "small_warehouse"),
}
def generate(self, args):
warehouse_type = args["warehouse_type"]
floor_area, reverberation_ratio, name = self.ROOM_SIZES[warehouse_type]
# Calculate dimensions
dims = Dimensions3D(width=20, length=25, height=10)
room = Room(
name=f"Warehouse: {name}",
dimensions=dims,
reverberation_time_ratio=reverberation_ratio,
furnitures={
"door": Furniture(
name="door",
x=0.10, y=0.10,
width=0.70, height=2.10, depth=0.5
)
}
)
return room
# Use custom generator
warehouse_gen = WarehouseRoomGenerator()
warehouse = warehouse_gen.generate({"warehouse_type": "big_warehouse"})
print(f"Warehouse area: {warehouse.get_square_meters():.1f} m²")
print(f"Warehouse volume: {warehouse.get_volume():.1f} m³")
Room Visualization - Visualize room layouts and configurations:
# Generate and visualize a room
room = MedicalRoomGenerator().generate({"room_type": RoomRole.EXAMINATION})
# Create detailed visualization
img = room.to_image(
show_anchors=True,
show_walls=True,
show_furnitures=True,
show_speakers=True,
show_microphones=True
)
# Display or save the image
img.show() # Display in notebook
img.save("room_layout.png") # Save to file
Microphone Positioning - Configure microphone placement:
from sdialog.audio.room import Room, MicrophonePosition, Position3D, Dimensions3D
# Different microphone positions
room = Room(
name="Demo Room",
dimensions=Dimensions3D(width=10, length=10, height=3),
mic_position=MicrophonePosition.CHEST_POCKET_SPEAKER_1
)
# Position microphone on desk
room_with_desk = Room(
name="Office Room",
dimensions=Dimensions3D(width=5, length=4, height=3),
mic_position=MicrophonePosition.DESK_SMARTPHONE,
furnitures={
"desk": Furniture(
name="desk",
x=2.0, y=2.0,
width=1.5, height=0.8, depth=1.0
)
}
)
# Custom 3D position
room_custom = Room(
name="Custom Mic Room",
dimensions=Dimensions3D(width=8, length=6, height=3),
mic_position=MicrophonePosition.CUSTOM,
mic_position_3d=Position3D(x=4.0, y=3.0, z=1.5)
)
Speaker Placement - Position speakers around furniture in a room with multiple furniture:
from sdialog.audio.utils import SpeakerSide, Role
from sdialog.audio.room import Room, Dimensions3D, MicrophonePosition
room = Room(
name="Demo Room with Speakers and Furniture",
dimensions=Dimensions3D(width=10, length=10, height=3),
mic_position=MicrophonePosition.CEILING_CENTERED
)
# Add furniture to room
room.add_furnitures({
"lamp": Furniture(
name="lamp",
x=6.5, y=1.5,
width=0.72, height=1.3, depth=0.72
),
"chair": Furniture(
name="chair",
x=2.5, y=4.5,
width=0.2, height=1.3, depth=0.2
)
})
# Position speakers around furniture
room.place_speaker_around_furniture(
speaker_name=Role.SPEAKER_1,
furniture_name="lamp"
)
room.place_speaker_around_furniture(
speaker_name=Role.SPEAKER_2,
furniture_name="chair",
max_distance=2.0,
side=SpeakerSide.BACK
)
# Calculate distances
distances = room.get_speaker_distances_to_microphone(dimensions=2)
print(f"Speaker 2D distances to the microphone: {distances}")
Voice Database Management
SDialog supports multiple voice database types for flexible voice selection. Let’s explore how to work with different voice sources!
HuggingFace Voice Databases - Use pre-trained voice collections:
from sdialog.audio.voice_database import HuggingfaceVoiceDatabase
# LibriTTS voices
voices_libritts = HuggingfaceVoiceDatabase("sdialog/voices-libritts")
# Kokoro voices
voices_kokoro = HuggingfaceVoiceDatabase("sdialog/voices-kokoro")
# Get voice statistics
print(voices_kokoro.get_statistics(pretty=True))
# Select voices based on characteristics
female_voice = voices_libritts.get_voice(gender="female", age=25, seed=42)
# Prevent voice reuse
male_voice = voices_libritts.get_voice(gender="male", age=30, keep_duplicate=False)
# Reset used voices for reuse
voices_libritts.reset_used_voices()
Local Voice Databases - Use your own voice files:
from sdialog.audio.voice_database import LocalVoiceDatabase
# Create database from local files with CSV metadata
voice_database = LocalVoiceDatabase(
directory_audios="./my_custom_voices/",
metadata_file="./my_custom_voices/metadata.csv"
)
# Add custom voices programmatically
voice_database.add_voice(
gender="female",
age=42,
identifier="french_female_42",
voice="./my_custom_voices/french_female_42.wav",
lang="french",
language_code="f"
)
# Get voice by language and prevent voice reuse
french_voice = voice_database.get_voice(gender="female", age=20, lang="french", keep_duplicate=False)
# Get statistics
print(voice_database.get_statistics(pretty=True))
Quick Voice Database - Create databases from dictionaries:
from sdialog.audio.voice_database import VoiceDatabase
# Create database from predefined voice list
quick_voices = VoiceDatabase(
data=[
{
"voice": "am_echo",
"language": "english",
"language_code": "a",
"identifier": "am_echo",
"gender": "male",
"age": 20
},
{
"voice": "af_heart",
"language": "english",
"language_code": "a",
"identifier": "af_heart",
"gender": "female",
"age": 25
}
]
)
# Use the voices
male_voice = quick_voices.get_voice(gender="male", age=20)
female_voice = quick_voices.get_voice(gender="female", age=25)
# Unavailable voice for this language (an error will be raised)
try:
female_voice_spanish = quick_voices.get_voice(gender="female", age=25, lang="spanish")
except ValueError as e:
print("Expected error:", e)
Microphone Effects
SDialog allows you to simulate different microphone effects by convolving audio with impulse responses from an impulse response database.
Apply Microphone Effect from a Local Impulse Response Database - Apply a microphone effect to an audio file by convolving it with an impulse response from a local database:
from sdialog.audio.processing import AudioProcessor
from sdialog.audio.impulse_response_database import LocalImpulseResponseDatabase, RecordingDevice
import soundfile as sf
import numpy as np
# Create a dummy metadata file and audio file for the example
with open("metadata.csv", "w") as f:
f.write("identifier,file_name\\n")
f.write("my_ir,my_ir.wav\\n")
sf.write("my_ir.wav", np.random.randn(16000), 16000)
# Initialize the database
impulse_response_database = LocalImpulseResponseDatabase(
metadata_file="metadata.csv",
directory="."
)
# Assume input.wav exists
sf.write("input.wav", np.random.randn(16000 * 3), 16000)
AudioProcessor.apply_microphone_effect(
input_audio_path="input.wav",
output_audio_path="output_mic_effect.wav",
device="my_ir", # or RecordingDevice.SHURE_SM57 for built-in devices
impulse_response_database=impulse_response_database
)
Using a HuggingFace Impulse Response Database:
from sdialog.audio.impulse_response_database import HuggingFaceImpulseResponseDatabase
# This requires the 'datasets' library
hf_db = HuggingFaceImpulseResponseDatabase(repo_id="your_username/your_ir_dataset")
ir_path = hf_db.get_ir("some_ir_identifier")
Multilingual Audio Generation
SDialog supports multilingual audio generation. You can use a compatible model from the Hugging Face Hub via HuggingFaceTTS or create your own custom TTS engine for more advanced use cases.
Using ``HuggingFaceTTS`` - Use any compatible model from the Hugging Face Hub:
from sdialog.audio.tts import HuggingFaceTTS
from sdialog.audio.pipeline import AudioPipeline
# Use HuggingFaceTTS for any compatible model from the Hub
# For example, with facebook/mms-tts-fra
hf_tts = HuggingFaceTTS(model_id="facebook/mms-tts-fra")
# Create an audio pipeline with the HuggingFaceTTS engine
# A voice_database is not specified, so a default one will be used.
# Note that the default voice database might not contain voices
# compatible with the selected HuggingFaceTTS model.
audio_pipeline = AudioPipeline(
tts_engine=hf_tts,
dir_audio="./hf_audio_outputs"
)
# Generate audio for the dialogue (assuming 'dialog' is an existing Dialog object)
# For multilingual models, you might need to pass language information.
# This can be done via tts_pipeline_kwargs. For example:
# tts_pipeline_kwargs={"speaker_embeddings": speaker_embedding, "language": "es"}
audio_dialog = audio_pipeline.inference(dialog)
Custom TTS Engine - Create your own TTS implementation for more advanced use cases (e.g. for Spanish):
import torch
import numpy as np
from sdialog.audio.tts import BaseTTS
from sdialog.audio.dialog import AudioDialog
from sdialog.audio.pipeline import AudioPipeline
from sdialog.audio.voice_database import LocalVoiceDatabase
class XTTSEngine(BaseTTS):
def __init__(self, lang_code: str = "en", model="xtts_v2"):
from TTS.api import TTS
self.lang_code = lang_code
self.pipeline = TTS(model).to("cuda" if torch.cuda.is_available() else "cpu")
def generate(self, text: str, speaker_voice: str, tts_pipeline_kwargs: dict = {}) -> tuple[np.ndarray, int]:
wav_data = self.pipeline.tts(
text=text,
speaker_wav=speaker_voice,
language=self.lang_code
)
return (wav_data, 24000)
# Use custom TTS for Spanish
spanish_tts = XTTSEngine(lang_code="es")
# Create spanish voice database
spanish_voices = LocalVoiceDatabase(
directory_audios="./spanish_voices/",
metadata_file="./spanish_voices/metadata.csv"
)
# Generate Spanish audio
audio_pipeline = AudioPipeline(
voice_database=spanish_voices,
tts_engine=spanish_tts,
dir_audio="./spanish_audio_outputs"
)
spanish_dialog = AudioDialog.from_dialog(dialog)
spanish_audio = audio_pipeline.inference(
spanish_dialog,
perform_room_acoustics=True,
dialog_dir_name="spanish_dialogue"
)
Language-specific Voice Assignment:
from sdialog.audio.utils import Role
# Assign specific voices from your voice database for different languages
spanish_voices = {
Role.SPEAKER_1: ("spanish_male_1", "spanish"),
Role.SPEAKER_2: ("spanish_female_1", "spanish")
}
spanish_audio = audio_pipeline.inference(
spanish_dialog,
voices=spanish_voices
)