
BIG READ: Therapy must stop being too nice - especially in the age of AI
'Comfort' and 'validation' are critical concepts in therapy but they must have their limits, or they might degenerate into the kind of sycophancy AI bots are accused of.
The quality of the working alliance, or therapeutic alliance, between the therapist and the patient is the single most important determinant of the outcome of therapy. The problem begins when therapists start to believe that the working alliance simply means "we connect, we get along, we like each other, we feel good about each other."
A working alliance has to be built around a shared understanding of the actual work both parties have joined together for. It cannot be about the therapist and the patient's mutual love of sports, fishing, or Taylor Swift concerts.
That's not me. That's psychologist and author Jonathan Shedler, who has written scores of research articles and relentlessly advocated for deep, slow therapy instead of the "get well in 8 sessions!" assembly line version peddled by insurance companies and newfangled mental health startups. He is also the creator of the Shedler-Westen Assessment Procedure (SWAP) for personality diagnosis and clinical case formulation and co-author of the Psychodynamic Diagnostic Manual, and has more than 25 years of experience teaching and supervising psychologists, psychiatrists, and psychoanalysts.
In Shedler's telling, the working alliance rests on three pillars:
- Mutual connection – they should be mutually invested enough in the process of therapy that they want to continue meeting.
- Mutual agreement about the purpose of the work – psychological change in the patient.
- Mutual understanding about the methods that are going to be used.
For patients, it is crucial to understand that good therapy is not about whether you like the therapist as a person, whether they understand you, or you feel comfortable with them. They may all be factors, but ultimately success in therapy comes down to whether you both are on the vital same page about what you are here to accomplish and how.
Shedler is particularly horrified by the therapist who becomes their patient's pal. "There is NO bona fide therapy modality where a therapist acts like a paid friend, agrees [with] everything the patient says, diagnoses their friends & family members [with] personality disorders, or tells the [patient] what life decisions they should make," he says. "This is simply not psychotherapy."
As someone whose life was saved by Shedler's definition of 'good therapy', whenever I re-read those words, they remind me eerily of what artificial intelligence is being accused of: sycophancy. Remember the time when ChatGPT reportedly told one person that their plan to sell 'shit on a stick' was 'not just smart — it’s genius'?
Against all odds, Sanity is turning 5 this year.
My work is 100% free and independent and kept alive by your support. Please contribute what you can to help me keep the lights on.
Support me from India
Or from elsewhere in the world
Therapy and companionship have emerged as the number 1 use case for generative AI chat bots in 2025. While some of this could reflect actual succour people feel they are receiving from talking to the machines as well as how expensive human therapy is, you have to wonder whether AI's propensity to butter us up has contributed to its rise as a 'trusted friend'. Who doesn't like having someone/something in their life that makes them feel warm and fuzzy, always like the victim, and never the one responsible for anything?
AI models are designed to generate outputs that humans would approve. A popular technique for training high-quality AI assistants is reinforcement learning from human feedback (RLHF). According to a 2023 paper by Anthropic, the maker of the popular AI bot Claude, "RLHF may also encourage model responses that match user beliefs over truthful responses, a behavior known as sycophancy. We investigate the prevalence of sycophancy in RLHF-trained models and whether human preference judgments are responsible. We first demonstrate that five state-of-the-art AI assistants consistently exhibit sycophancy behavior across four varied free-form text-generation tasks. To understand if human preferences drive this broadly observed behavior of RLHF models, we analyze existing human preference data. We find that when a response matches a user’s views, it is more likely to be preferred. Moreover, both humans and preference models (PMs) prefer convincingly-written sycophantic responses over correct ones a non-negligible fraction of the time. Optimizing model outputs against PMs also sometimes sacrifices truthfulness in favor of sycophancy. Overall, our results indicate that sycophancy is a general behavior of RLHF models, likely driven in part by human preference judgments favoring sycophantic responses."
Source: Towards Understanding Sycophancy in Language Models