Serena Labs
A Three-Layer Blueprint for Insurance Enrollment Software

The binary that loses

Most digital insurance enrollment platforms force the customer through one of two end-to-end interface types.

Path A: the form-first platform. The customer lands on a multi-page form. Fields for demographics, dependents, employer, prior coverage. Plan comparison table. Sort and filter controls. Some inline help. Submit, sign, done. Operationally efficient. Conversion-weak on customers who do not already know what they want.

Path B: the chatbot-first platform. The customer lands on a conversational interface. A generative agent asks questions, listens to answers, recommends plans. Modern-feeling. Operationally appealing for low-touch markets. Conversion-weak on customers who do not know the right questions to ask.

Both fail on a substantial fraction of the customer base. Path A fails the low-Health-Insurance-Literacy customer who cannot self-navigate the comparison. Path B fails the high-stakes-decision customer who needs structured comparison and gets unstructured dialogue. The literature on each failure mode is consistent and was reviewed in our companion pieces on the preference–performance paradox and the HIL crisis.

The reason both fail is structural. Enrollment is not one task. It is three. Forcing a single interface type across the three produces predictable failure modes regardless of which interface you pick.


The three task types

Decompose a typical individual health insurance enrollment journey into its functional steps and a clear taxonomy emerges.

Task Type 1, low complexity, high variability. FAQ, eligibility lookups, coverage verification, life-event triggers, "I just got a new job, what do I need to know?" Decision stakes per interaction are low. The space of acceptable responses is large. Speed and flexibility matter more than completeness.

Task Type 2, high complexity, low variability. Plan selection, contract finalization, beneficiary configuration, suitability assessment under IDD, multi-attribute comparison. Decision stakes are high (long-term financial and health consequences). The space of acceptable responses is constrained (the user must complete a specific structured decision; partial answers are not viable). Completeness, ordering, and choice architecture matter more than naturalness.

Task Type 3, recovery and re-engagement. Abandonment events, post-enrollment confusion, mid-cycle changes, plan-switching at open enrollment. Decision stakes vary. Emotional context matters. Flexibility and empathy in interaction are valuable; deterministic flow control less so.

The published evidence is clear that each task type maps to a different optimal interface type, and that the same interface for all three under-performs.


The three-layer architecture

What the evidence supports is a hybrid architecture with three layers, each optimized for its task type. The structural diagram below is the one we use at Serena Labs and that we developed in the underlying research.

table_06_task-type-taxonomy.png

Figure: Hybrid architecture for health insurance enrollment. Layer 1 (Qualification & Triage) uses conversational AI for low-complexity tasks. Layer 2 (Configuration & Selection) is the anchor of the architecture and uses a Structured Interface (classical wizard or Structurally Guided LLM Interface) with four embedded Design Requirements. Layer 3 (Abandonment Recovery) uses asynchronous conversational AI. The re-entry path connects recovery back to Layer 2.

Layer 1: Qualification and Triage

Interface type: Open-ended or semi-structured conversational AI.

Role: First contact with the prospect. Establishes basic eligibility (age, employment, family composition, jurisdiction), captures life context (recent job change, new dependent, qualifying life event), assesses coverage priorities (continuity of care, network requirements, budget constraints), and routes the user to the appropriate wizard configuration.

Why conversational works here: The task is variable (no two prospects have identical situations), low-stakes per interaction (the consequence of a misroute is recoverable), and benefits from naturalness (prospects who are not yet committed engage more readily with conversational interfaces than with forms). Conversational AI's known strengths (naturalness, flexibility, availability) are aligned with the task demands.

Operational considerations: Compute footprint can be moderate; latency tolerance is high (prospects are evaluating, not transacting); deterministic completeness is not required (incomplete qualification data can be filled in Layer 2). LLM-based agents are well-suited; rules-based chatbots can also work.

Layer 2: Configuration and Selection (the anchor)

Interface type: Structured Interface with Choice Architecture. Implemented as either a classical wizard or a Structurally Guided LLM Interface (SGLI). The choice between the two is a secondary engineering decision; the structural scaffolding is non-negotiable.

Role: All high-complexity, high-stakes enrollment decisions. Plan configuration, multi-attribute comparison, beneficiary setup, contract finalization, regulatory disclosure flows (IDD suitability, ANS adequacy, ACA SBC).

Four design requirements (embedded by construction):

DR1, Structural Scaffolding. Sequential steps respecting working-memory limits. Cowan's estimate of approximately four elements of active working memory is more conservative than Miller's classical 7±2 (Miller 1956; Sweller, Ayres, & Kalyuga 2011) and is the appropriate ceiling for high-stakes consumer decisions. Plan-attribute presentation is decomposed into 3–5 sequential steps with summary checkpoints.

DR2, Choice Architecture Embedding. Plan ordering by estimated profile fit, partitioning into salient subgroups, intelligent defaults. Dellaert et al. (2024) in the Journal of Marketing documented that combining ordering and partitioning produces economically significant consumer welfare improvements (N = 3,866 across one field study and three experiments). Johnson et al. (2013) in PLOS ONE documented that choice-architecture interventions reduced consumer error from $533 (control) to $77 (defaults + calculator) per person in an ACA-Marketplace context.

DR3, Literacy-Adaptive Scaffolding. Detect or infer HIL level through interaction signals or upfront screener. Adapt scaffolding intensity in real time. Low-HIL users receive more decomposition, more inline explanation, more guided comparison. High-HIL users receive streamlined flows. The same interface accommodates both populations without imposing a unified-but-poorly-fit experience.

DR4, Regulatory Compliance by Design. Mandatory disclosure flows are first-class architectural elements, not optional add-ons. IDD suitability assessment, ANS information adequacy norms, ACA Summary of Benefits and Coverage. The interface enforces completion of these flows before allowing contract finalization. This is the requirement that distinguishes regulated-product enrollment from generic consumer onboarding and is not delegated to the LLM.

Why structure works here: The task is multi-attribute and high-stakes. The user (typically low-to-moderate HIL) cannot self-scaffold the comparison. Open-ended dialogue exceeds working-memory capacity and does not enforce the regulatory completeness the product requires. Structured interfaces have measurable advantage on completion (67–86% range in adjacent benchmarks per Baymard 2025, against 30–40% open-chatbot range), on decision quality (Bhargava et al. 2017; Dellaert et al. 2024), and on regulatory defensibility.

Layer 3: Abandonment Recovery and Support

Interface type: Asynchronous conversational AI (SMS, WhatsApp, email, in-app push).

Role: Re-engagement of users who abandoned Layer 2. Empathic, flexible interaction at evidence-based timing windows. The goal is not to complete the decision in Layer 3; it is to bring the user back to Layer 2 in a state that allows successful completion.

Why conversational works here: Recovery is empathic and emotionally textured. The customer who abandoned needs to be met where they are, not pushed through a structured flow that already failed them once. Chatbot strengths are an asset; structural scaffolding would be counterproductive.

Re-entry pattern: The dashed return arrow from Layer 3 to Layer 2 in the diagram is operationally significant. When the recovery interaction succeeds, the user re-enters Layer 2 with preserved state: the wizard remembers where the user left off, what they already answered, what they were evaluating. Returning to a blank form after a recovery message is a friction step that defeats the recovery's purpose.


Why the binary fails

With the three-layer architecture in view, the failure modes of the form-first and chatbot-first platforms become specific.

Form-first fails Layer 1. A prospect who is not yet committed is not going to fill out a 30-field form to find out whether they should consider your product. They will leave. Conversational triage in Layer 1 captures the same information through low-friction dialogue. Form-first platforms either lose these prospects entirely or accept a high cost per acquisition to compensate.

Chatbot-first fails Layer 2. The customer who has decided to enroll and needs to compare three plans across eight attributes is not served by an open-ended dialogue. The chatbot cannot enforce comparison completeness; cannot apply ordering and partitioning consistently; cannot enforce regulatory disclosure; cannot adapt scaffolding to HIL level without a structural decision tree underneath. The chatbot can wrap the wizard in conversational language (this is the SGLI pattern), but it cannot replace the wizard.

Both fail Layer 3. A platform that excludes asynchronous recovery is treating abandonment as terminal. In a category with 70%+ baseline abandonment rates for non-optimized funnels (Baymard 2025), abandonment recovery is not optional. Synchronous in-session retry is not a substitute for asynchronous channel reach.

The three-layer architecture is not a compromise between form and chatbot. It is a contingency-driven allocation that fits each task with the interface the evidence supports.


How to apply this to your platform

For a product or operations leader evaluating their current architecture, three diagnostic questions.

1. What is the average duration and abandonment rate per layer in your current funnel? If Layer 1 abandonment is the dominant loss, you are likely running a form-first platform that turns off prospects before they convert. If Layer 2 completion-quality is the dominant loss (high completion but high subsequent service-desk load, high plan-switching, high dominated-choice rate), you are likely running a chatbot-first platform that produces completion without decision quality. If Layer 3 does not exist as a tracked surface, you are leaving recovery on the table.

2. Are the four design requirements in Layer 2 explicit? Walk through your platform's plan-selection flow. Is the comparison sequence ordered by profile fit (DR2)? Is the working-memory load per step bounded (DR1)? Is the scaffolding intensity adjustable to user signal (DR3)? Are regulatory disclosures structurally enforced (DR4)? Vendors and internal teams who cannot answer all four clearly are running an implicit version of the layer that does not benefit from the design knowledge the literature provides.

3. Is Layer 3 connected to Layer 2 with state preservation? Returning users from an SMS recovery to a fresh-state Layer 2 form is a recovery that does not recover. The state-preservation requirement is mundane engineering and is consistently the difference between "we have recovery" and "we recover."


What Serena Labs does

The three-layer architecture is the production blueprint for Serena Labs' customer engagement platform. We deploy Layer 1 as conversational AI for qualification and triage; Layer 2 as a Structurally Guided LLM Interface with DR1–DR4 embedded by construction; Layer 3 as asynchronous recovery with state-preserved re-entry to Layer 2. We benchmark each layer on layer-specific metrics: Layer 1 on qualification yield and qualification quality; Layer 2 on completion-rate, plan–profile adequacy, and HIL-disaggregated outcomes; Layer 3 on recovery rate and post-recovery completion-rate parity.

If you operate a digital enrollment platform and the three-layer decomposition is not yet how you reason about it, book a walkthrough. We can map your current funnel to the three-layer model and identify which layer is bearing the largest unaddressed loss.


Read next

This piece is part of a series on evidence-based healthcare customer engagement. The pillar overview, "Beyond AI Chatbot Hype: An Evidence-Based Framework for Healthcare Customer Engagement", lays out the full contingency framework. See also the companion pieces on the preference–performance paradox, the $36 billion question, the HIL crisis, and structure versus technology.

Key references:

  • Baymard Institute (2025). E-commerce checkout usability: An original research study. baymard.com/research/checkout-usability

  • Bhargava, S., Loewenstein, G., & Sydnor, J. (2017). Choose to lose: Health plan choices from a menu with dominated options. The Quarterly Journal of Economics, 132(3), 1319–1372. doi.org/10.1093/qje/qjx011

  • Dellaert, B. G. C., Johnson, E. J., Duncan, S., & Baker, T. (2024). Choice architecture for healthier insurance decisions: Ordering and partitioning together can improve consumer choice. Journal of Marketing, 88(1), 15–30. doi.org/10.1177/00222429221119086

  • Fras, M., Pauch, D., Walczak, D., & Bera, A. (2024). Determinants of the behaviour of entities on the insurance market in the light of changes introduced by the IDD Directive. Journal of Consumer Policy, 47, 533–566. doi.org/10.1007/s10603-024-09572-z

  • Hanmante, S., Patil, S., & Shahade, A. K. (2025). A multi-module AI system for intelligent health insurance support using retrieval-augmented generation. Scientific Reports. doi.org/10.1038/s41598-025-31038-6

  • Johnson, E. J., Hassin, R., Baker, T., Bajger, A. T., & Treuer, G. (2013). Can consumers make affordable care affordable? The value of choice architecture. PLOS ONE, 8(12), e81521. doi.org/10.1371/journal.pone.0081521

  • Miller, G. A. (1956). The magical number seven, plus or minus two: Some limits on our capacity for processing information. Psychological Review, 63(2), 81–97. doi.org/10.1037/h0043158

  • Sweller, J., Ayres, P., & Kalyuga, S. (2011). Cognitive load theory. Springer. doi.org/10.1007/978-1-4419-8126-4

Let's connect

Want to go deeper on this?

We work directly with healthcare leaders applying AI to patient engagement, clinical operations, and revenue cycle. A 30-minute conversation usually saves months of internal back-and-forth.

STAY UPDATED

New insights every month.

Healthcare leaders get original research on AI-first transformation, market trends, and product strategy.