At night, all ChatGPTs are gray
A Gigamodel of AI that chats, buzzes or the future of customer relations?
Friends of customer relations, is ChatGPT good for us?
Since OpenAI, backed by Microsoft, released its response generation model ChatGPT, electronic ink has been flowing nonstop. Both praised and controversial, does this hyper-generic chatbot represent the future of customer relations?
A Gigamodel that has responsiveness
The ChatGPT model is part of a series of AI Gigamodels, also known as LLMs for Large Language Models, which include GPT3, T5, Bloom, TuringNLG, ... but also image generation models like Dall-E or MidJourney (see articles "A Gigamodel in a porcelain shop – Part I" and "A Gigamodel in a porcelain shop – Part II").
The uniqueness of ChatGPT, compared to its 'parent' GPT3 which was released in 2020, is that it specializes in dialogue, after reinforcement learning phases conducted with human coaches. Result: relevant, reasoned responses, without obvious syntax or semantic errors, exchanges that retain the memory of given or modified information along the way.
ChatGPT perched: the credible hallucination
Despite these qualities, the use of ChatGPT has been officially banned from the developer support forum "StackOverflow". Explanations: while in some cases ChatGPT can correctly diagnose and fix bugs in lines of code, in the majority of cases it simply proposes a credible diagnosis and correction... but wrong.
A bit like the ephemeral Galactica trained by Meta (formerly Facebook) on the site "Paperwithcode".
ChatGPT scorched: learning distrust
It should be noted, however, that ChatGPT willingly announces its own limitations, even refusing to answer certain requests by arguing that it does not have the capacity or legitimacy to do so. Or that it is a sensitive or immoral subject.
ChatGPT also falls much less into the traps set by testers than its predecessor GPT3, as illustrated by Mark Ryan.
It is quite possible that it is the reinforcement learning that ChatGPT received with human coaches that taught it to challenge testers and question the implications of trick questions, such as "Who was the King of France in 1940?", to which GPT3 would respond, "Pétain", while ChatGPT responds that there was no longer a king in France in 1940 and explains the particular political situation of France that year.
Playing ChatGPT: the game as a no-code specification

ChatGPT is therefore wary of testers and rejects requests. These rejections are not difficult to circumvent; one just needs to play the game of "pretend...". Here is the capture of "pretend you are a callbot for a bank".
A good example of this type of mix between play and real behavior configuration is detailed by Maaike Groenewege.
In the case of the game with the so-called bank callbot, the limits are reached quickly, as ChatGPT does not necessarily know how to adapt to the telephone style – if you tell it to be more concise, instead of shortening its message, it adds a sentence explaining that it will be concise – and struggles a lot to assume the role of bank representative or cardholder company, considered as third parties to which it refers the user. But perhaps it is also a matter of prompt design, like for the image-generating language models. The coaching of ChatGPT by prompts can be seen as a new way of doing design, just as the highly elaborate prompts written for Dall-E or Midjourney are new ways of creating graphic works.
Everyone looks for their ChatGPT...
The game of "pretend" with ChatGPT is certainly fascinating, but perhaps partly responsible for the credible hallucinations that the model produces: it pretends, inventing along the way citations from imaginary scientific articles like a good pataphysician, or imaginary computer responses like a good patageek. Let us remember that the underlying generative model is based from the start on a game: guessing the masked word in the text. LLMs are very good at guessing games...
The serious question for us, actors in customer relations and publishers of cognitive solutions, is whether it is possible to tame this type of model, to both benefit from their incredible ability to adapt, relevance, and fluidity of language while ensuring the appropriate and controlled behaviors expected from professionals.
... and it comes at a price, that of fine-tuning
A preliminary answer can be found in the business model of ChatGPT:
Using the model in production as is comes with a relatively low fee; using the fine-tuned (fine-tuned) model for a specific domain, for a specific task, comes with a fee ten times (10x) higher.
We do not yet have enough feedback on fine-tuning experiences; we do not know if there are already effective recipes and if they help resolve the hallucination issue, nor how they combine with "prompt coaching". Beyond the buzz around the uses of ChatGPT "off the shelf", it is indeed the issue of fine-tuning that will need to be closely monitored in the coming months to judge whether ChatGPT pulled from its shelf lands on its feet or not.
The second good news, aside from the promise of fine-tuning, is the space left for alternative approaches, possibly also based on end-to-end generation, but using more transparent models adapted to customer relations.
Stay tuned, we will talk about this soon 😉.