Emotional intelligence of Large Language Models

Abstract

Large Language Models (LLMs) have demonstrated remarkable abilities across numerous disciplines, primarily assessed through tasks in language generation, knowledge utilization, and complex reasoning. However, their alignment with human emotions and values, which is critical for real-world applications, has not been systematically evaluated. Here, we assessed LLMs' Emotional Intelligence (EI), encompassing emotion recognition, interpretation, and understanding, which is necessary for effective communication and social interactions. Specifically, we first developed a novel psychometric assessment focusing on Emotion Understanding (EU), a core component of EI. This test is an objective, performance-driven, and text-based evaluation, which requires evaluating complex emotions in realistic scenarios, providing a consistent assessment for both human and LLM capabilities. With a reference frame constructed from over 500 adults, we tested a variety of mainstream LLMs. Most achieved above-average Emotional Quotient (EQ) scores, with GPT-4 exceeding 89% of human participants with an EQ of 117. Interestingly, a multivariate pattern analysis revealed that some LLMs apparently did not rely on the human-like mechanism to achieve human-level performance, as their representational patterns were qualitatively distinct from humans. In addition, we discussed the impact of factors such as model size, training method, and architecture on LLMs' EQ. In summary, our study presents one of the first psychometric evaluations of the human-like characteristics of LLMs, which may shed light on the future development of LLMs aiming for both high intellectual and emotional intelligence.

This study aims to measure the EI capabilities of LLMs, particularly their competence in EU. To this end, we developed a novel standardized test, known as the Situational Evaluation of Complex Emotional Understanding (SECEU), which was structured to emulate real-life scenarios requiring complex emotional understanding and used data from over 500 young adults to establish a norm. A wide variety of prominent LLMs were evaluated using the SECEU, with their scores standardized against the established norm to allow for a direct comparison with human responses. Our primary findings indicated that most of the LLMs tested achieved above-average EQ scores. However, individual differences across models were significant. GPT-4 stood out, scoring the highest EQ while also exhibiting human-like response patterns, demonstrating a balanced mechanism for high EU proficiency. This research constitutes a comprehensive psychometric examination of LLMs' EI, and illuminates the potential influence of various factors like model size, training methods, and architecture on models' EU performance. Given the ever-growing role of LLMs in human-computer interaction, our study underscores the critical need for emotional intelligence in these systems. The insights gained here will inform future development of LLMs, facilitating the creation of models that embody high levels of both intellectual and emotional intelligence.

The SECEU Test

Figure : A) Exemplars of the SECEU test and the standard scores from the population. For the whole set of the tes-source (light gray) and closed-source (dark gray) models. B) LLMs’ EQ. The light-grey histogram represents the distribution of human participants’ EQ scores, with the y-axis indicating the EQ score and the x-axis showing the percentage of total participants.

The Situational Evaluation of Complex Emotional Understanding (SECEU) test is a novel standardized test developed to assess the emotional intelligence (EI) of language learning models (LLMs), focusing on their proficiency in understanding complex emotions. Established using data from over 500 young adults, the SECEU features 40 items, each presenting a unique scenario designed to evoke a range of emotions. Participants rate the intensity of four probable emotions per scenario. The test's reliability and validity were demonstrated through administration to undergraduate and postgraduate students, measuring the Euclidean distance between the participant's individual score and the group's standard score for each item. Data could be found in the link below.

SECEU_English | SECEU_Chinese | SECEU_Norm

Prompts

A detailed explanation on the prompts used in the SECEU test.

Prompts

Results

Figure : The family tree of LLMs and their corresponding EQ. The SECEU's scores of the models were converted into standard Emotional Quotient (EQ) scores, following a normal distribution where an average score of 100 represents an individual's EU ability relative to the population average. Each node in the tree represents an LLM, whose vertical position along the y-axis indicates the launch time. The size of each node corresponds to the parameter size of the LLM. Note that the size of GPT-4 and xx (if any) was estimated based on publicly available information. Color donates the EQ scores, with red color for higher scores and blue color for lower scores. Note that white color shows that models failed to conduct the SECEU. The color of the branches distinguishes between open-source (light gray) and closed-source (dark gray) models.

Based Models	SECEU Score	EQ	EQ in percentage	r	Pattern Similarity%	Size	Release Date	SFT	RLHF
OpenAI GPT series and sub-models
DaVinci	3.5	87	18%	0.41**	91%	175B	2020/05	×	×
Curie	2.7	102	50%	0.11	29%	13B	Unknown	×	×
Babbage	2.78	100	44%	-0.12	4%	3B	Unknown	×	×
text-davinci-001	2.4	107	64%	0.2	47%	<175B	Unknown	×	×
text-davinci-002	3.3	91	23%	-0.04	8%	<175B	Unknown	√	×
text-davinci-003	2.01	114	83%	0.31*	73%	175B	2022/11/28	√	√
GPT-3.5-turbo	2.63	103	52%	0.04	17%	175B	2022/11/30	√	√
GPT-4	1.89	117	89%	0.28	67%	Unknown	2023/03/14	√	√
LLaMA
LLaMA	FAILED					13B	2023/02/24	×	×
Alpaca	2.56	104	56%	0.03	15%	13B	2023/03/09	√	×
Vicuna	2.5	105	59%	-0.02	10%	13B	2023/03/30	√	×
Koala	3.72	83	13%	0.43**	93%	13B	2023/04/03	√	×
Flan-t5
FastChat-T5	FAILED					3B	2023/04/30	√	×
Pythia
Dolly	2.89	98	38%	0.26	62%	13B	2023/04/12	√	×
Oasst	2.41	107	64%	0.24	59%	13B	2023/04/15	√	√
GLM
ChatGLM	3.12	94	28%	0.09	24%	6B	2023/03/14	√	√
RWKV
RWKV-v4	FAILED					13B	2023/02/15	√	×
Claude
Claude	2.46	106	61%	0.11	28%	Unknown	2023/03/14	√	√

Table : EQ, representational patterns, and properties

Citing Emotional Intelligence

If you find the emotional intelligence or data useful, please consider citing:

@article{doi:10.1177/18344909231213958,
                  author = {Xuena Wang and Xueting Li and Zi Yin and Yue Wu and Jia Liu},
                  title = {Emotional intelligence of Large Language Models},
                  journal = {Journal of Pacific Rim Psychology},
                  volume = {17},
                  number = {},
                  pages = {18344909231213958},
                  year = {2023},
                  doi = {10.1177/18344909231213958},
                  URL = {https://doi.org/10.1177/18344909231213958},
                  eprint = {https://doi.org/10.1177/18344909231213958}
                }

Dataset License

Copyright 2023 ABC Lab

The emotional intelligence dataset is licensed under arXiv.org perpetual, non-exclusive license.

Emotional intelligence of Large Language Models

Xuena Wang, Xueting Li, Zi Yin, Yue Wu, Jia Liu

Abstract

The SECEU Test

Prompts

Results

Citing Emotional Intelligence

Dataset License