By Dr Maddy Janickyj, Research Fellow in Natural Language Processing (NLP) for the Violence, Health, and Society (VISION) Consortium, University College London
As a data-focused VISION researcher with a PhD specialising in Natural Language Processing (NLP; see our previous blog for more about this), I initially avoided ChatGPT and similar tools. ChatGPT, a type of Large Language Model (LLM) developed by OpenAI, offers capabilities like summarising information, translating text, and even coding.
While ChatGPT is potentially the most well-known example of a LLM, similar models are integrated into many everyday tools. For instance, LLMs are the underlying technology in many customer service chatbots, virtual assistants like Alexa, and writing tools such as Grammarly. These LLMs are trained on large sets of data with the intention of getting them to understand (and in some cases generate) language. The models draw on this training to complete various tasks and are finetuned to work for specific domains. Their breadth of abilities and the many open-source models that have been developed make them the perfect methodological tool for researchers in both computer science and the social sciences. For clarity, an open-source LLM is one whose code and architecture are publicly available.
To further understand how LLMs are being used by researchers and to consider how the tools would integrate with and support violence-related research, I – a mathematician turned computational social scientist – attended the Oxford LLMs workshop. The event, held at Oxford’s Nuffield College, aimed to bring early-career scholars up to speed with the technical foundations, real-world applications, and research potential of LLMs. Throughout the week, I met with PhD/Masters students and other Post-doctoral researchers interested in using LLMs to evaluate anything from economic, linguistic, and political issues, for example.
Understanding LLMs: Lectures and Industry Insights
The first few days provided foundational lectures and talks, showcasing the technical underpinnings and application of LLMs. One of the big draws was the calibre of speakers. We heard from industry experts working at well-known companies such as Meta, Ori , Qdrant, Wayfair, Intento, Arize AI, and Google.  
We then started our deep dive into LLMs, including how they are trained and evaluated. We heard about the numerous ways you can fine-tune LLMs, a step which occurs after general pre-training and tailors a model to meet domain/task-specific needs. Fine-tuning methods such as Continued Pre-training, Supervised Fine-tuning, and Preference Tuning were highlighted. Each technique offers different ways of adapting LLMs to specialised domains without needing to re-train them from scratch, saving computational resources.
We also covered common challenges associated with finetuning models. One of these is “catastrophic forgetting,” where a model’s performance declines in one area when it’s fine-tuned on another. For example, if a model is adjusted to improve name recognition, it may inadvertently lose accuracy in identifying locations. This side effect is something I encountered when finetuning other NLP models during my PhD and illustrates the balance required when refining LLMs.
Applying LLMs: Collaborative Research Projects
In the latter half of the week, workshop attendees collaborated on research projects, exploring LLM applications across social science realms. This was a hands-on opportunity to test LLM methodologies discussed earlier and apply them to real-world social science challenges.
Leading up to the workshop we had the chance to review the proposed project briefs, gather literature showcasing how LLMs are used in our respective disciplines, and finally rank the four projects according to our own skillsets and research interests. One of the projects we decided to tackle as a group focused on developing an LLM purely for social science research. LLMs are considered to have some form of bias, for example against certain demographic groups, and with this ongoing project, we wanted to create a fair, unbiased, and open-source LLM suited to the social sciences.
In another project, we examined gender bias in academia. For this, we used Google’s Gemini to classify the gender of authors in academic syllabi. By experimenting with prompts, we measured how well the LLM could assess gender trends in syllabus authorship. Using tools like Google Colab, we collaboratively coded and refined our approach, leveraging Gemini’s capabilities to highlight gender disparities effectively. In some cases, we found the model to correctly classify 100% of the authors’ genders. This project underscored both the potential and the limitations of LLMs in accurately capturing nuanced social issues.
Appreciating the Potential: Be Cautious
Overall, the Oxford workshop demonstrated how LLMs can be powerful tools in social science research including violence-related research such as what we do at VISION, provided they are tailored to specific domain needs and applied with caution. Hearing directly from researchers and industry professionals offered invaluable guidance on both leveraging and responsibly implementing LLMs. Its also important to consider the data you are utilizing and the outputs you are expecting. In my current area (which focuses on technology-facilitated abuse), an increasing number of researchers are using sensitive data and the outcomes of such research can impact the lives of real individuals. Thus, for anyone in the social sciences looking to integrate cutting-edge NLP methods, understanding the complexities behind these models and their applications is essential. I encourage readers to look at the work being done currently by the workshop participants, and to keep an eye out for later outputs of the workshop!
For further information, please contact Maddy at m.janickyj@ucl.ac.uk