On Friday, Anthropic debuted research unpacking how an AI system’s “personality” – as in, tone, responses, and overarching motivation – changes and why. Researchers also tracked what makes a model “evil.”

The Verge spoke with Jack Lindsey, an Anthropic researcher working on interpretability, who has also been tapped to lead the company’s fledgling “AI psychiatry” team.

“Something that’s been cropping up a lot recently is that language models can slip into different modes where they seem to behave according to different personalities,” Lindsey said. “This can happen during a conversation – your conversation can lead the model to start beh …

Read the full story at The Verge.


Source: The Verge.


Leave a Reply

Your email address will not be published. Required fields are marked *

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

The reCAPTCHA verification period has expired. Please reload the page.