Development of a Machine Learning Model to Predict Creativity Ratings for the Video Game New Pokémon Snap
Darian Stapleton
Advisor: Thalia R Goldstein, PhD, Department of Psychology
Committee Members: Philseok Lee, Seth Hudson
Online Location, Online
April 11, 2025, 10:00 AM to 12:00 PM
Abstract:
The measurement of creative outputs and processes is critical to understand how and when creativity occurs. Research in creativity measurement includes a variety of tasks, from basic cognitive exercises to the evaluation of arts pieces and performances. However, along with this variety comes inconsistency in scoring criteria between studies. There are additional measurement struggles when working with child samples, as traditional creativity tasks may not be sufficient to motivate children or evoke their true creative potential. Video games are a creative domain with limited research, but hold promise as an effective way to evaluate children’s (and adults) creativity given their highly motivating design, familiarity, and built-in instruction to reduce task variation. In the present study, a measure of creativity through video games using the consensual assessment technique was executed for the game New Pokémon Snap. A total of 1,000 images from the game’s online interface were gathered and rated by experts. Additionally, a subset of the images were rated by children ages 10-13 to determine whether children can reliably rate creativity in a domain (video games) where they have ample experience.
To further maximize the ability to replicate this video game-based creativity measure, a vision transformer and two convolutional neural network machine learning models were built to predict the creativity scores based on the game images from New Pokémon Snap, so that new raters are not needed in future uses of this game for creativity measurement. Additionally, a random forest model and support vector machine model were built to predict creativity ratings using other forms game data to evaluate feature importance, indicating what aspects of the video game may be most relevant to player creativity. Both the human ratings and the ratings predicted by the machine learning models were compared to ChatGPT’s performance when prompted to rate the creativity of the images using both zero-shot and few-shot prompts . The model with the best performance was the vision transformer model using the images as the only features. Additionally, adult experts were able to obtain an acceptable level of reliability using the CAT and child raters, while still not reliable enough to be used for the CAT, were much more reliable than expected and did not differ based on gaming experience. Implications of these findings, including the potential for future training in children’s creativity judgement and utility of the vision transformer model in future studies are discussed.