Research & Application

Introduction

Music is seen to have a great deal of effect on human emotions. In the progression of a classical movement, a composer can raise or lower the emotions of an audience at will. With the musical expression of an idea, an artist can inspire the will of the masses. As displayed by the Baltic singing revolution, the power of a song can unify a country under a common goal (Smidchens, 2014). While, as discussed in “The auditory culture reader”, the repetition of a theme can be exploited as a form or torture (Bull and Back, 2020). Whether the source is symbolism or just mere exposure, the emotional effect music has on people is observably powerful.

Despite this, it is arguably not typical or commonplace for people to deeply consider or investigate their emotional responses to a piece of music. This is reflected in the ways music is visually represented; the focus lies on the qualities of the sound or the musical notation, rather than the emotions it invokes in the listener. This lack of illustrative medium leaves a conceptual gap in the way people visualise the subjective qualities of music. The project discussed in this report aims to bridge this gap by generating both an interface and a medium in which emotional responses to music are captured and visually represented – in a similar light to traditional forms such as sheet music, graphical scores, and audio-reactive visualisations.

In order to facilitate a medium in which emotional responses are visualised, an interface that can capture these responses must first be constructed. This is a reasonably tricky thing; not only can there be a requirement for expensive recording equipment, but an appropriate method of data classification is also required (Mauss and Robinson, 2009). However, recent advancements in the commercial availability of consumer electroencephalography (EEG) equipment has made this problem significantly more accommodating. Therefore, this project suggests that, by using advanced machine learning techniques, it is possible to train and implement artificial intelligence (AI) models that can recognise patterns and features in EEG data that describe emotional responses.

Through the use of these emerging creative technologies, the project hopes to contribute to the discussion of AI as a creative platform – while also exploring new forms and methods of music visualisation and interaction. The research generated by this project is built on the question: “How can we model the relationship between music and emotions, using electroencephalography and machine learning”. It is also the hope of the investigator to create a new interface for human-computer interaction, drawing on inspiration from the work and discussions of the “Creative AI lab” (Bunz and Jäger, 2021).

Research

The task of measuring emotional responses is by no means an easy one; there are many intricate elements to consider when conducting such experiments, and any endeavours should be well-informed. As described in “Measures of emotion” (Kaplan, Dalal and Lunchman, 2013), there are many techniques whereby an investigator may measure identifiers of emotional states – each of which entailing their own advantages and disadvantages. While methods such as self-reporting and body language observation may provide seemingly satisfactory results, they often accompany some degree of unintentional bias (Kaplan, Dalal and Lunchman, 2013). This is where more transparent, quantitative psychophysiological methods become effective (Kaplan, Dalal and Lunchman, 2013).

While emotions are typically described as single or multiple instances of discrete events (i.e., happy and sad or just happy), many studies suggest they are actually a convergence of various central nervous system (CNS) signals (Mauss and Robinson, 2009). It is recommended in “Measures of emotion: A Review” that, when approaching emotion measurement, an investigator should base their analysis in dimensions of valence and activation – rather than attempting to capture emotions as discrete states (Mauss and Robinson, 2009). This concept is further explored in “The circumplex model of affect: An integrative approach…”, where it is outlined that valence and activation exist as two discrete systems of the CNS (Posner, Russell, and Peterson, 2005). The article further suggests that the intensity of these two systems can be plotted to create a 2-dimensional model of affective experience (Posner, Russell, and Peterson, 2005). The existence of this valence-activation relationship enables an investigator to ground their measurements in two absolute values, rather than examining arbitrary components and features of individual emotional states.

The recognition and adoption of the valence-activation model is prevalent throughout recent studies and practical examples. There are numerous instances where the valence-activation model has been used to evaluate measurements from various sources. Moreover, the psychophysiological recording method electroencephalography (EEG) is frequently used and referenced as a viable source of data for the retrieval of valence-activation information. In a project report by eNTERFACE, titled “Emotion Detection in the Loop from Brain Signals and Facial Images”, the investigators use this technology in their measurement of valence-activation values (Savran et al., 2006). While this study provides affirming results for the use of EEG as a valence-activation measurement device, it rightly acknowledges the problems associated with EEG based measurements – namely, electronic interference or “noise” (Savran et al., 2006). The presence of noise in EEG data is relatively commonplace, as the electrodes used are extremely sensitive to the voltage fluctuations caused by muscle movement. This means actions like jaw clenching, blinking, and smiling can cause abnormalities and spikes in the data (Savran et al., 2006). While there is no attempt to filter the recordings in this example, it is absolutely necessary to clean the incoming data to prevent inaccurate or contaminated results (Jiang, Bian and Tian, 2019). However, in the case of the eNTERFACE project, this was considered in the methodology and steps were put in place to reduce the impact.

Arguably, the most significant challenge when attempting to measure emotion state identifiers is finding a way to extract the relevant information from the raw/filtered data. There are many theories and suggestions on how to do this, in regard to EEG data, but the emerging consensus appear to focus on machine learning – specifically, neural networks. This is because, as explained in “Neural Networks and Deep Learning”, neural networks possess an immense ability to intelligently learn the rules of a training dataset – especially when the training dataset is given identifying labels (Aggarwal, 2018). This “black-box” learning process makes them ideal for situations where manually implementing the rules of a relationship is not feasible (Aggarwal, 2018) – such as in stock market predictions and self-driving cars. Furthermore, the node-based structural nature of neural networks makes them extremely versatile and robust in their trained application (Aggarwal, 2018). These factors make neural networks a perfect candidate for a task such as valence-activation retrieval from EEG data.

While standard neural networks are excellent at understanding the rules and relationships of data in linear instances, they lack the ability to analyse temporal and spatial patterns. This is where more complex structures such as convolutional neural networks (CNN) and long short-term memory (LSTM) networks become necessary (Aggarwal, 2018). This is a relevant issue in the retrieval of valence-activation values from EEG data as the network would be required to assess signal patterns over time, rather than just the values reported every time the headset updated. This problem was addressed in “A Study on Mental State Classification using EEG-based Brain-Machine Interface” in which the investigators applied a short-time windowing technique to the data for time-series statistical feature extraction (Bird et al., 2018). This method gives the network a sense of temporal context and allows it to make more general and reasonable predictions of mental state over a given time.

Related Work

As stated in the introduction, this project aimed to create a new medium in which the emotional responses to music are visualised. In the realisation of this goal, the work of Javier Casadidio has been a particular source of inspiration. Using tools such as TouchDesigner, Casadidio produces generative artworks and posts them online (Casadidio, 2021). A number of these artworks are dynamic posters that retain a still image until interacted with, then becoming videos (Casadidio, 2021). This is an interesting way of displaying art because it allows the artists to produce a still image while also enabling them to show the innerworkings of the generative process.

As a secondary goal, this project aims to create a new interface for human-computer interaction – one that can capture human emotions, using machine learning, and output them to any other creative applications. This goal was developed in response to a panel discussion titled “Aesthetics of New AI Interfaces” (Bunz et al., 2021). In this discussion, speakers commented on how developers could use AI to create new modes of human-computer interaction and the forms in which such interfaces could manifest (Bunz et al., 2021). It was concluded that while one might assume artists aren’t interested in breaking open the “black-box” of machine learning, they regularly rise to the challenge and bend existing programs to create new forms of art (Bunz et al., 2021). This is an important concept to internalise when designing an artistic interface; the more data you make available to the user, the more artistic freedom they have.

In a commissioned work funded by the Saatchi & Saatchi Wellness agency, the studio “Random Quark” lead a project named “The Art of Feeling” (Random Quark, 2017). In this project, the creators used EEG to capture the emotions of participants as they recalled a particularly emotional moment in their life (random quark, 2017). These emotions were painted to a canvas, using a particle system, and displayed in a gallery (Random Quark, 2017). While this project was highly successful, the method used to calculate emotions was arguably somewhat misleading. According to the source code, the data collector calculated a valence value by comparing the amplification lateralization of the alpha bandwidth, and an activation value from the average amplitude of the alpha and gamma bandwidths (Random Quark, 2017). While these calculations have some basis in scientific reasoning, the implementation is likely too rudimentary and rigid to accurately reflect the emotion state of the participant (Altenmüller, 2002). However, given the artistic aims of the project, this was likely of minor concern, as it did not have a negative effect on the visual or conceptual output.

Applicaiton

In achieving its goal, the project made use of the technologies discussed here. Namely, electroencephalography, machine learning, and creative coding. Through research, important factors and concepts for EEG emotion-identifier retrieval were learned – such as, brain activation spatiality, digital signal processing, and advanced deep learning techniques. For example, the use of a sliding window approach, as suggested by Bird et al. (2018), is present in this project’s approach through the use of spectrogram-like images and image classification using a convolutional neural network. Instead of employing a data-centric method for time-series feature extraction, the project pioneered an art-centric method. Moreover, the approach to emotion state classification was heavily influenced by many of the papers and reports explored in the literature review; the project uses the valence-activation approach, rather than classifying singular discrete emotion states. Lastly, drawing on Bunz and Jäger’s discussions on new AI interfaces, the project was repackaged as a user-friendly application that could serve as an interface for other projects (for example, Lundheim, an affective experience based video game developed by the investigator for another assessment; McIntosh 2021). The investigator plans to continue the development of this program after the project is complete, to make it ready for public release.

The investigator also met with two academics from the Artificial Intelligence and Music (AIM) Centre for Doctoral Training (CDT) at Queen Mary University of London, to discuss the use of machine learning for EEG-emotion classification in music perception. After explaining the internal tooling of the Neural Scores application, both academics agreed that it was an interesting approach that would be worth investigating at a higher level. The investigator was also invited to apply for the AIM CDT next year, for a related project under their supervision.

The appearance and presentation of the final output of the project was heavily inspired by the related works. For example, the implementation of dynamic posters by Casadidio provoked the investigator to explore multimodal presentation techniques such as the virtual gallery and the QR code based physical-digital exhibition counterpart. Furthermore, the project’s use of affective experience as an element in visual art and graphic design was influenced by Random Quarks’ use of emotion in The Art of Feeling.

The project presented here arguably makes meaningful contributions to the research fields it inhabits. The approach to emotion classification explored by the project is, to the best of the investigator’s knowledge, unique and original in nature. Furthermore, the use of affective experience in the visual representation of music is an area of art that has largely been unexplored. Therefore, this project offers a unique insight into the possible applications and forms of this new visual medium.


References

Aggarwal, C., 2018. Neural Networks and Deep Learning. Springer International Publishing.

Altenmüller, E., 2002. Hits to the left, flops to the right: different emotions during listening to music are reflected in cortical lateralisation patterns. Neuropsychologia, 40(13), pp.2242-2256.

Bird, J., Manso, L., Ribeiro, E., Ekart, A. and Faria, D., 2018. A Study on Mental State Classification using EEG-based Brain-Machine Interface. 2018 International Conference on Intelligent Systems (IS), pp.795-800.

Bull, M. and Back, L., 2020. The auditory culture reader. Taylor & Francis.

Bunz, M. and Jäger, E. 2021. Creative AI Lab. [online] Creative ai.org. Available at: [Accessed 28 February 2021].

Bunz, M., Jäger, E., Fiebrink, R., Andersen, C. and Cameron, A., 2021. Aesthetics of New AI Interfaces. Serpentine Gallery. [Panel Discussion]. Available at: [Accessed 12 Feb 2021].

Casadidio, J., 2021. yop3rro. [Instagram]. Available at: [Accessed 3 March 2021].

Jiang, X., Bian, G. and Tian, Z., 2019. Removal of Artifacts from EEG Signals: A Review. Sensors, 19(5), p.987.

Kaplan, S., Dalal, R. and Lunchman, J., 2013. Measurement of emotions. In: L. Tetrick, M. Wang and R. Sinclair, ed., Research Methods in Occupational Health Psychology, 2nd ed. Routledge, pp.61-75.

Mauss, I. and Robinson, M., 2009. Measures of emotion: A review. Cognition Emotion, 23(2), pp.209-237.

McIntosh, T., 2021. Lundheim. Windows [Game]. London.

Posner, J., Russell, J. and Peterson, B., 2005. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology. Development and Psychopathology, 17(03).

Randomquark.com. 2017. random quark. [online] Available at: [Accessed 2 March 2021].

Savran, A., Ciftci, K., Chanel, G., Mota, J., Viet, L., Sankur, B., Akarun, L., Caplier, A. and Rombaut, M., 2006. Emotion Detection in the Loop from Brain Signals and Facial Images. [online] Dubrovnik

Smidchens, G., 2014. The Power of Song: Nonviolent National Culture in the Baltic Singing Revolution. University of Washington Press.