OpenAI found features in AI models that correspond to different ‘personas’

By looking at an AI model's internal representations — the numbers that dictate how an AI model responds, which often seem completely incoherent to humans — OpenAI researchers were able to find patterns that lit up when a model misbehaved.

promo text article page promo link article page.

back_to_articles_list