Archives

Natural Language Processing: Interrogating free text in mental healthcare records to capture experiences of violence

    Violence can be categorised in a variety of ways for example physical, sexual, emotional, and domestic but all cause significant physical and mental morbidity within general populations. Individuals with a severe mental illness have been found to be significantly more likely to experience domestic, physical, and sexual violence compared to the general population. For these individuals, experiences of violence are important risk factors however, this is not routinely collected by mental health services.

    In general data on all forms of violence has been inadequately available from healthcare records. This is partly due to the lack of routine enquiry by professionals at points of clinical contact, and partly because instances of violence are difficult to identify in healthcare data in the absence of specific coding systems.

    A general challenge for using health records data for research is that the most valuable and granular information is frequently contained in text fields (e.g., routine case notes, clinical correspondence) rather than in pre-structured fields; this includes mentions of violence whether experienced as a victim or perpetrated. Capturing violence experiences across mental healthcare settings can be challenging because most instances are likely to be recorded as unstructured text data. Therefore, natural language processing (NLP), is increasingly in use to extract information automatically from unstructured text in electronic health records, particularly in mental healthcare, on clinical entities.

    Dr Ava Mason from Kings College London and VISION researchers Professor Robert Stewart, Dr Angus Roberts, Dr Lifang Li, and Dr Vishal Bhavsar worked with colleagues to apply NLP across different clinical samples to investigate mentions of violence. They ascertained recorded violence victimisation from the records of 60,021 patients receiving care from a large south London NHS mental healthcare provider during 2019. Descriptive and regression analyses were conducted to investigate variation by age, sex, ethnic group, and diagnostic category.

    Results showed that patients with a mood disorder, personality disorder, schizophrenia spectrum disorder or PTSD had a significantly increased likelihood of victimisation compared to those with other mental health diagnoses. Additionally, patients from minority ethnic groups for Black and Asian had significantly higher likelihood of recorded violence victimisation compared to White groups. Males were significantly less likely to have reported recorded violence victimisation than females.

    The researchers demonstrated the successful deployment of machine learning based NLP algorithms to ascertain important entities for outcome prediction in mental healthcare. The observed distributions highlight which sex, ethnicity and diagnostic groups had more records of violence victimisation. Further development of these algorithms could usefully capture broader experiences, such as differentiating more efficiently between witnessed, perpetrated and experienced violence and broader violence experiences like emotional abuse.

    To download the paper: Frontiers | Applying neural network algorithms to ascertain reported experiences of violence in routine mental healthcare records and distributions of reports by diagnosis

    To cite: Mason AJC, Bhavsar V, Botelle R, Chandran D, Li L, Mascio A, Sanyal J, Kadra-Scalzo G, Roberts A, Williams M, Stewart R. Applying neural network algorithms to ascertain reported experiences of violence in routine mental healthcare records and distributions of reports by diagnosis. Frontiers in Psychiatry 2024 Sep 10. doi:103389/fpsyt.2024.1181739

    Illustration from Adobe Photo Stock subscription

    Mental health service responses to violence: VISION symposia at the European Psychiatric Association

      An aim of the VISION programme is to examine the nature and extent of contact that people with experience of violence have with various health and justice services.

      Findings on mental health services were presented in a series of symposia at the European Psychiatric Association’s Section on Epidemiology and Social Psychiatry this year.

      The first brought together six studies on experiences of violence and adversity and implications for mental health service use. These included King’s College London’s Anjuli Kaul presenting on Sexual Violence in Mental Health Service Users and Sian Oram on Mental Health Treatment Experiences of Minoritised Sexual Violence Survivors, with further contributions from Emma Soneson (Oxford), Maryam Ghasemi (Auckland), and Ladan Hashemi and Sally McManus (both City St George’s).

      A second symposium highlighted the value of the Adult Psychiatric Morbidity Survey to violence research, with Sally McManus presenting on Threatening or Obscene Messages from a Partner and Mental Health, Self-harm and Suicidality.

      Finally, a third symposium featuring VISION researchers Angus Roberts, Rob Stewart and others and highlighted how natural language processing can be used with information collected in mental health settings. Sharon Sondh (South London and Maudsley NHS Foundation Trust) presented on classifying experiences of violence in mental healthcare records.

      Natural Language Processing: Improving Data Integrity of Police Recorded Crime

        By Darren Cook, Research Fellow in Natural Language Processing at City, University of London

        Did you know that police recorded crime data for England and Wales are not accredited by the UK’s Office for Statistics Regulation (OSR)? This decision, made by the OSR after an audit in 2014, was due to concerns about the reliability of the underlying data.

        Various factors affect the quality of police-recorded data. Differences in IT systems, personnel decision-making, and a lack of knowledge-sharing all contribute to reduced quality and consistency. Poor data integrity leads to a lack of standardisation across police forces and an increase in inaccurate or missing entries. I recently spoke about this issue at the Behavioural and Social Sciences in Security (BASS) conference at the University of St. Andrews, Scotland.

        Correcting missing values is no small feat. In a dataset of 18,000 police recorded domestic violence incidents, we found over 4,500 (25%) missing entries for a single variable. Let’s assume it takes 30 seconds to find the correct value for this variable – that’s 38 hours of effort – almost a full working week. Given that there could be as many as twenty additional variables, it would take over four months to populate all the missing values in our dataset. Expanding such effort across multiple police forces and for multiple types of crime highlights the inefficiency of human-effort in this endeavour.

        In my talk, I outlined an automated solution to this problem using Natural Language Processing (NLP) and supervised machine learning (ML). NLP describes the processes and techniques used by machines to understand human language, and supervised ML describes how machines learn to predict an outcome based on previously seen examples. In this case, we sought to predict the relationship between the victim and offender – an important piece of demographic information vital to ensuring victim safety.

        The proposed system would use a text-based crime ‘note’ completed by a police officer to classify the victim offender relationship as either ‘Ex-Partner”, “Partner”, or “Family” – in keeping with the distinction made by Women’s Aid. Crime notes are an often-overlooked source of information in police data, yet we found they consistently referenced the victim-offender relationship. The goal of our system, therefore, was to extract the salient information from the free-form crime notes and populate the corresponding missing value in our structured data fields.

        Existing solutions based on keywords and syntax parsing are used by multiple UK police forces. While effective, they require manual effort to create, update, and maintain the dictionaries, and they don’t generalise well. Our supervised ML system, however, can be automatically updated and monitored to maintain accuracy.

        When tested, our system achieved 80% accuracy, correctly labelling the relationship type in four out of five cases. In comparison, humans performed this task with approximately 82% accuracy – an arguably negligible difference. Moreover, once trained, our system could classify the entire test set (over 1,000 crime notes) in just sixteen seconds.

        However, we noted some limitations, the biggest of which was a high linguistic overlap in crime notes between ‘Ex-Partner’ and ‘Partner’ that caused several misclassifications. We believe more advanced language models (i.e., word embeddings) will improve discrimination between these relationships.

        We also discovered a potential prediction bias against minorities. Although victim ethnicity wasn’t included in our training setup, we observed reduced accuracy for Black or Asian victims. The source and extent of this bias are subjects of ongoing research.

        Our findings highlight the promise of automated solutions but serve as a cautionary tale against assuming these systems can be applied carte blanche without careful consideration of their limitations. Several outstanding questions remain. Is a system with 80% accuracy good enough? Is it better to leave missing values rather than predict incorrect ones? Incorrectly identifying a perpetrator as a current partner rather than an ex-partner could significantly impact the victim’s safety. Additionally, a model biased against certain ethnicities risks overlooking the specific needs of minority groups.

        The conference sparked lively and engaging conversation about many of these issues, as well as the role that automation can be play within the social sciences more broadly. A research article describing these results in full is the focus of ongoing work, and the presentation slides are available below as a download.

        For further information please contact Darren at darren.cook@city.ac.uk or via LinkedIn @darrencook1986

        Dr Darren Cook, An application of Natural Language Processing (NLP) to free-form Police crime notes – 1 download

        Photo by Markus Spiske on Unsplash

        Calling all crime analysts: Share your experiences of using text data in analysis

          Are you a crime analyst or researcher? If so VISION would really like to hear about your experiences of using text data in your analysis.

          We developed a short survey that will take approximately 5 minutes to complete. Qualtrics Survey | Crime Analyst Survey

          This survey is designed to explore your experiences working with free-text data. Your feedback will enable us to evaluate the need for software designed to assist analysts working with large amounts of free text data.

          Participation is voluntary and all responses will be anonymous. Information will be confidential and will not be shared with any other parties, and will be deleted once it is no longer needed.

          The deadline to provide feedback using the link above is 30 June 2024.

          Illustration from licensed Adobe Stock library