Synthetic datasets enable linkage and a longitudinal understanding of experiences of violence and health impacts and consequences

    Violence is a complex social problem and a public health issue, with implications for the health and social care systems, police and justice systems, as well as significant productivity losses for those who experience it. Analysing data collected by these systems can aid understanding of the problem of violence and how to respond to it. In social research, analysing administrative records together with survey data has already enabled better measurements of violence and its costs, capturing experiences of both victim-survivors and perpetrators across multiple points in time and social and economic domains.

    Ideally, data from the same individuals would enable linkage and a longitudinal understanding of experiences of violence and their (health) impacts and consequences. However, most studies in violence-related research analyse data in silo due to difficulties in accessing data and concerns for the safety of those exposed. This is particularly the case for data from third sector specialist support services for victims or perpetrators of violence which has, to VISION’s knowledge, not been linked or combined with other datasets. Because these services provide person-centred trauma-informed care and there is a risk that information on their service users may be used against them in courts or by immigration authorities, direct data linkage is not possible and alternatives are needed.

    With this research, VISION researchers Dr Estela Capelas Barbosa, Dr Niels Blom, and Dr Annie Bunce provide a proof-of-concept synthetic dataset by combining data from the Crime Survey for England and Wales (CSEW) and administrative data from Rape Crisis England and Wales (RCEW), pertaining to victim-survivors of sexual violence in adulthood. Intuitively, the idea was to impute missing information from one dataset by borrowing the distribution from the other.

    The researchers borrowed information from CSEW to impute missing data in the RCEW administrative dataset, creating a combined synthetic RCEW-CSEW dataset. Using look-alike modelling principles, they provide an innovative and cost-effective approach to exploring patterns and associations in violence-related research in a multi-sectorial setting.

    Methodologically, they approached data integration as a missing data problem to create a synthetic combined dataset. Multiple imputation with chained equations were employed to collate/impute data from the two different sources. To test whether this procedure was effective, they compared regression analyses for the individual and combined synthetic datasets for a variety of variables.

    Results show that the effect sizes for the combined dataset reflect those from the dataset used for imputation. The variance is higher, resulting in fewer statistically significant estimates. VISION’s approach reinforces the possibility of combining administrative with survey datasets using look-alike methods to overcome existing barriers to data linkage.

    Recommendations

    • Imputing missing information from one dataset by borrowing the distribution from the other should be applicable for costing exercises as it permits micro-costing. 
    • Compared to traditional research, VISION’s proposed approach to data integration offers a cost-effective solution to breaking (data-related) silos in research.

    To download the paper: Look-alike modelling in violence-related research: A missing data approach | PLOS One

    To cite: Barbosa EC, Blom N, Bunce A (2025) Look-alike modelling in violence-related research: A missing data approach. PLoS ONE 20(1): e0301155. https://doi.org/10.1371/journal.pone.0301155

    For further information, please contact Estela at e.capelasbarbosa@bristol.ac.uk

    Illustration from Adobe Photo Stock subscription

    Publications