
Last month, Youth Impact had the opportunity to present our work at the AI Impact Summit in India—a landmark convening organized by the Government of India and described as a technological turning point for the country. The Summit brought together representatives from over 80 nations, including heads of state, global technology leaders, philanthropies, and social sector organizations—like us—that are exploring how artificial intelligence can accelerate development outcomes.
As part of The Agency Fund’s Demo Day, our India team shared our AI pilot work automating data collection and analysis within Ganitha Ganaka, the evidence-based phone tutoring program implemented in partnership with the Government of Karnataka.

Ganitha Ganaka: The problem we are solving

Across India, as in many countries, school attendance does not automatically translate into learning. Large class sizes and competing demands on teachers mean that children who fall behind can remain invisible and fail to catch back up.
Ganitha Ganaka addresses this through targeted one-on-one tutoring in foundational mathematics for students in Grades 3 to 5, with classes taking place over a basic phone accessible to nearly all. The model is built on randomized evaluations conducted across multiple countries, demonstrating learning gains of up to 0.35 standard deviations—meaning that children learn in weeks what might otherwise take months.

In Karnataka, the program has expanded rapidly. Following strong first-year results in select districts, the state government is scaling implementation statewide. In the current academic year, more than 50,000 teachers tutored students over the phone as part of the Ganitha Ganaka initiative. But scaling to this level raises operational challenges.

Each six-week tutoring cycle ends with an assessment call, where teachers evaluate student progress and record learning levels. At scale, this represents hundreds of thousands of assessment calls. While essential for tracking progress, these calls require time that could otherwise be spent teaching.
For an organization committed to supporting teachers to reach more students with quality education, this presented an opportunity to rethink how teacher time is used, and whether AI can help.
Can AI give teachers more time to teach?
Over the past eight months, we have developed and piloted an AI assessment model that automates these endline calls.

The objective is to free up teachers’ time so they can focus more on teaching and supporting student learning, by shifting routine assessment tasks to the AI system. Automating endline assessments can reduce teacher workload per cycle by approximately twenty percent. At statewide scale, this added efficiency could enable teachers to reach hundreds of thousands of additional students within the same academic year. An added benefit of automating endline assessment is a reduction in bias.
Technology will never replace teachers—but technology in the hands of teachers can be transformational.
What we are learning
We tested the model with approximately 500 households; engagement was encouraging. A majority of households answered the calls, most caregivers understood the purpose of the assessment, and nearly all expressed openness to future AI-led interactions.

The technical journey has been complex, shaped by a set of constraints that continue to influence how the system evolves.
1. Working in a low-resource language
Kannada remains underrepresented in the AI ecosystem. Limited training data affects how well models handle vocabulary, pronunciation, and conversational flow, especially in real-world use cases.
2. Child voice recognition
Capturing and interpreting children’s voices reliably remains a central challenge. Performance varies by age, with younger students’ responses harder to detect and transcribe consistently. This affects the completeness of assessments, particularly in early grades.
3. Real-world call environments
Assessments take place in home settings where conditions are unpredictable. Background conversations, television noise, shared devices, and intermittent connectivity all influence how responses are captured and processed at scale.
4. Sustained user engagement
Early pilots showed a gap between call pickup and meaningful participation. Caregivers and students needed clearer context on the purpose of the calls and what was expected of them. Small changes in call design and communication significantly influenced completion rates.
Through multiple iterations—refining call scripts, adjusting logic flows, enabling parent-supported responses where needed, and conducting back-check calls—the system has become significantly more stable. These adaptations continue to be tested as the system moves toward larger-scale deployment.
Evaluation framework
To ensure these challenges are addressed systematically, we are evaluating the system at multiple levels as it develops. This work is being developed in partnership with the Agency Fund and the Center for Global Development (CGD), using a structured evaluation approach that guides how the system is tested and improved over time.
At the model level, we assess how the system performs on core tasks, including accuracy, consistency, and its ability to respond appropriately to user inputs in a controlled environment. At the product level, we track how the system is used in practice, including call pickup rates, completion rates, and points of drop-off. We are also beginning to examine user-level indicators, such as whether caregivers understand the purpose of the calls and whether children are able to participate meaningfully. Finally, we are also evaluating the model's effectiveness by comparing human vs. AI assessment of learner’s learning levels. This approach allows for continuous iteration, with each round of testing informing the next set of improvements before moving toward larger-scale evaluation.
Adopting this approach transforms abstract discussions about AI reliability, engagement, and impact into concrete, empirical evaluations that can inform public policy.


Themes from the AI Impact Summit
Several recurring themes emerged across sessions, discussions, and side events. These insights are particularly relevant to our work, and underscore and help inform how we are thinking about using AI within education systems.
1. Infrastructure is no longer the primary constraint
Data shared during the Summit, including findings from the BaSE 2025 survey, suggested high smartphone and internet penetration across Indian households. Many families already use digital tools such as YouTube and WhatsApp to support learning.
This shifts the focus from access to implementation quality. The constraint is less about connectivity and more about designing tools that are evidence-based, contextually appropriate, and integrated into existing systems.
2. Reducing teacher workload is a top priority
At the CSF convening and other education-focused sessions, reducing teacher administrative burden emerged as a central concern. There was strong agreement that foundational literacy and numeracy will not improve unless teachers are given more time to focus on instruction.
This aligns closely with our own exploration of AI-led assessment. The question is not whether AI should be used in education in the abstract, but whether it can meaningfully free up teacher time while preserving instructional quality. Our pilot is one attempt to answer that question.
3. Shifting from pilots to system integration
Across conversations—particularly during The Agency Fund’s LaunchPad sessions—there was a shared recognition that building a promising pilot is only the beginning. The harder work lies in refining the model within government systems, co-designing evaluations, navigating administrative turnover, and eventually transferring ownership.
Teams emphasized the importance of involving government research bodies early, even if that slows initial rollout. Co-design builds institutional ownership and increases the likelihood that evaluation findings are accepted and acted upon. This reinforced a direction we have been pursuing: embedding both implementation and learning within the system rather than operating parallel to it.
Looking ahead
We left with a sense that the next phase of AI in education will depend less on isolated innovation and more on careful integration: co-designing with governments, strengthening evaluation processes, building reliable speech models for local languages, and ensuring that cost structures remain sustainable at scale.
For Youth Impact, our focus is now:
Continue improving the accuracy of our assessment model
Deepen collaboration with partners working on child voice datasets
Strengthen the evidence base as the program scales
Ensure that the technology meaningfully reduces teacher workload rather than adding complexity.
Progress has come through repeated testing, small adjustments, and close attention to how the system performs in real contexts. Each iteration adds clarity on what holds under real-world conditions and what requires further refinement. If AI is to contribute to public education systems, it will do so through steady iteration, collaboration across organizations, and alignment with government priorities.
Acknowledgments
Our work in AI has been made possible through the AI4GD accelerator program, providing advisory from solution engineers at OpenAI, software engineering, data science, and behavioral science expertise from The Agency Fund & Project Tech4Dev, as well as policy advisory from Center for Global Development.
Explore further
