Open Research Online Exploring the core ‘preoccupation’ of social work writing: A corpus-assisted discourse study

The profession of social work has become increasingly writing-intensive in recent decades, yet little empirical research has been carried out on the nature of this writing. This paper describes and explores the one million-word corpus compiled as part of the Writing in Professional Social Work Practice in a Changing Communicative Landscape study, outlining the challenges involved in collecting and anonymising hard-to-reach texts from social workers ( n =38) across three UK Local Authorities. Using the methodology of corpus-assisted discourse analysis alongside ethnographic insights and in consultation with expert insiders, the paper focuses on what a keyword analysis reveals about the core focus or ‘preoccupation’ (Baker, 2010) of social work writing. Attention is paid to the three main text categories of writing in social work — casenotes, emails and assessment reports — and to the three social work domains of children’s, adult generic and adult mental health services. Findings include confirmation of the extensive recording of communication exchanges, differences in the ways social workers refer to their own and service users’ views, and the considerable extent to which evaluation is threaded through all social work writing via the use of lexis. We also discuss how keyword analysis can provide a set of ‘candidate professional lexis’ and further examine selected items. The paper concludes by reflecting on aspects of methodology, in particular considering the subjectivity around keyword calculation, the equal treatment of all items in a corpus, and the usefulness of combining keyness analysis with additional data sources.


Introduction
In professional social work, the production of written texts is a high-stakes activity, playing a central role in all decisions about services and simultaneously used to evaluate social workers' professional competence. Social work writing (often referred to as recording or paperwork) is viewed as central to social work practice (Social Work Inspection Agency, 2010;Ofsted, 2017) and frequently the target of criticism in inspection reviews (Care Quality Commission, 2017;Department of Education, 2011). Despite the criticisms made and the significance of writing in social work practice, little empirical research has been carried out on the nature of this writing and there has as yet been no corpus analysis of social workers' writing. This paper draws on findings from exploration of the one million-word WiSP corpus, compiled as part of the three-year Economic and Research Research Council-funded study Writing in Professional Social Work Practice in a Changing Communicative Landscape (WiSP, 2015) to characterise writing in contemporary social work.
In the paper, we examine findings from the whole WiSP corpus, following the widely-adopted corpus-assisted discourse studies (CaDS) methodology of keyword extraction followed by thematic classification of key items and further exploration of these items through collocates, concordance lines and close reading of whole texts (cf. Leedham, 2015;Partington, Duguid and Taylor, 2013; and studies in Taylor and Marchi, 2018a). CaDS research combines techniques from both corpus linguistics and discourse analysis, thereby offering a blending of quantitative and qualitative text analysis. We additionally drew on ethnographic researcher insights drawn from the multiple WiSP datasets (including 70 interviews with social workers and field notes from 10 observation weeks), and in particular for this paper used detailed comments from 11 expert insiders, which included social workers, social work educators and representatives from professional bodies; bracketed inserts are used throughout to comment on where these insights relate to specific instances. While many CaDS studies draw on such extra-textual datasets and expert informants, these are not a prerequisite for defining a study as CaDS research. However, the incorporation of such elements is seen as a key part of the CaDS approach in this paper as part of the WiSP project's overall aim of providing an in-depth account of the nature of social work writing.
The specific aim of this paper is to use CaDS to unpack the focus or core 'preoccupation' (Baker, 2010, p. 26) of writing in social work. The paper also explores some meth-odological points around keyness, particularly with relevance to the use of small corpora of hard-to-reach texts, and the benefit of insider perspectives.

Situating social work writing within professional writing in corpus research
Studies of professional writing cover the areas of health, social care, and business, and in recent decades researchers in these fields have increasingly used corpus linguistics methods, sometimes as a starting point for critical discourse analysis (CDA). For example, Parkinson and Howarth (2008) combine corpus linguistics and CDA to explore micro discourses around social enterprise, and O'Halloran (2009) explores newspaper texts to uncover evidence for The Sun newspaper's quasi-campaign on the supposed negative effects of immigration.
More frequently, corpus linguistics is employed in combination with discourse analysis to explore larger datasets than is possible through close reading alone; for example, in business, Crawford (2010) explores discourse connectives in corpora of financial disclosure genres. In the field of healthcare, Kinloch and Jaworska (2020) use large corpora of lay, medical and media accounts to explore and compare discourses around postnatal depression. Many researchers have commented on how corpus analysis helps to reveal linguistic patterns which would otherwise remain hidden if discourse analysis was the sole method. One such study by Hunt and Harvey (2015) uses keyword, collocation and concordance analyses to find quantitatively dominant linguistic features through which people discuss anxiety around eating disorders, exploring these qualitatively through discourse analysis. Corpus analysis has also been combined with genre analysis in analysing accounting narratives (Rutherford, 2005) and as a way in to metaphor analysis in US corporate mission statements (Sun and Jiang, 2014).
Within social care, corpus analysis has been largely confined to interview transcripts (e.g., Bell and Seidel's 2012 study of transcripts from 18 health agency CEOs) and official documentation (e.g., Bell et al.'s 2013 study of national accreditation standards across seven countries). While natural language processing studies of corpora of medical practitioners' casenotes exist (e.g., Perera et al., 2016), to our knowledge there have been no corpora compiled from social workers' casenotes, reports or other texts, although a 0.5million-word corpus mainly consisting of published textbook and training materials was compiled by Johnson (2017Johnson ( , 2019. Research into social workers' writing to date has been largely small-scale, usually featuring single sites only and with limited attention to the written texts produced. The study of written texts by social workers can therefore be described as part of an emergent field (Sarangi, 2005) in professional discourse studies.
Given that social work involves engagement with people at vulnerable points in their lives, the texts produced are often sensitive and always highly confidential. Texts are consequently hard to access, and fit with Swales' (2004) description of occluded texts, as they are produced and stored within Local Authority settings and not available beyond the confines of social services. Due to the inclusion of practitioners' confidential texts and the extensive need to anonymise (see Section 3.1), the WiSP corpus is unlike most other corpora used in CaDS research which 'privilege certain text types or registers, at the expense of others' (Baker, 2018, p. 283); that is, most research has been conducted on readily-accessible media texts rather than hard-to-access personal documents. The compilation of the WiSP corpus represents an important first step in corpus creation for social work writing, enabling more systematic text-based research into this professional discourse than has previously been possible.

Methodology
This section outlines the process of compiling the corpus, describes the WiSP corpus itself and discusses the tools and procedures used in extracting and exploring key items.

Data collection and preparation
The WiSP corpus is a one-million-word collection of over 4,600 texts produced by 38 social workers within three UK Local Authorities from 2015 to 2017. Following ethical approval and extensive consultation with Local Authority gatekeepers, texts were collected and anonymised on the three sites before the research team were allowed access. The anonymisation process involved substituting potentially identifying details with codes, for example a service user's name was coded as [SU], the name of a service user's husband became [SUH], and an event date was coded as [DATE]. A deduplication process was then carried out to tag the many sections of texts not written by individual social workers (e.g., heading text and questions within assessment forms, datestamps in casenotes and boilerplate disclaimers and signatures in emails) in order to exclude these sections from analysis (resulting in a reduction in analysable data of 375,000 words or 27%). Duplicate text produced by writers themselves was retained (e.g., paragraphs for several children within a family, repeated action points and summaries, and emails copied into casenotes).

The corpus
The complete WiSP corpus comprises 4,608 texts and has a total word count of 1,003,089 (excluding non-social worker writing such as email headers and boilerplate disclaimers). The corpus contains three main text categories using labelling from within social work: casenotes, emails and assessment reports. Casenotes are ongoing updates added to each service user's record on a Local Authority IT system; emails are mainly messages to colleagues and other professionals written within an email system; and assessment reports are initial or ongoing reports on a case, also on the IT system. Together, these three text categories constitute 94% of the WiSP corpus, with the remaining 6% comprising miscellaneous text types such as letters, administration forms and finance requests (see Table 1). Texts in the corpus vary from the very short (e.g., a one-word email and a four-word casenote) to the very long (a 10,000-word court report), giving a large standard deviation of 645.18 for a mean text length of 213.79 words. Throughout the study, we remained aware of the lack of homogeneity of WiSP texts, and viewed the corpus as comprised of individual texts rather than a bag of words (Egbert and Schnur, 2018; see Lillis, Leedham and Twiner, 2017, for more discussion of WiSP text categories). An alternative division of the corpus is by social work domain, since WiSP texts are collected from children's, adult generic and adult mental health services (see Table 2).  The dominant social work domain in the WiSP corpus of written texts is children's services. The imbalance across domains is fortuitous, due in large part to the understandable difficulty of securing agreement with Local Authorities around access and therefore the researchers' reliance on the specific permissions of access that Local Authorities were willing to give. Local Authorities have a clear duty of care towards service users with legal and ethical responsibility for protecting the personal data of vulnerable people, and we realised early in the project that we could not fulfil Jaworska and Kinloch's idealised scenario of including 'all possible [textual] data produced in a given context ' (2018, p. 114). Instead, we adopted an opportunistic sampling frame, whereby we collected all texts available to us within the time frame 2015 to 2017. The resulting corpus has differing numbers of texts by text category, social work domain and individual writer; although these are limitations from a corpus linguistics perspective, the WiSP corpus remains the only corpus of social workers' writing currently available.

Tools and techniques of analysis
Corpus techniques enable researchers to decontextualise data, view it as abstractions and subsequently recontextualise the data (Partington, 2018). Keyword analysis is frequently used as a way into corpus comparison within the CaDS researcher's toolkit, offering a valuable starting point for further iterative corpus-based and qualitative textual analysis. For this study, Wmatrix corpus software (Rayson, 2008) was used to extract key lexical items from the corpus using British English 2006 (BE06), a one-million-word corpus of published general written British English (Baker, 2009), as a reference corpus. Wmatrix was selected for key item extraction as it includes access to BE06, allows calculation of statistical confidence levels using Bayes Factor and %DIFF effect size calculations, and enables the extraction of both individual lexical items and multi-word units, where the latter are pre-listed. The use of Wmatrix was limited to keyword extraction for manual examination across social worker writers and texts by the team.
The %DIFF effect size metric was used to establish keyness based on the size of the difference between the occurrence of items in two corpora (Gabrielatos, 2018). The resulting 'candidate key items' (CKIs; cf. Gabrielatos, 2018) were exported to Excel and filtered to extract all positively key items with a Bayes Factor of at least 2 (LL16.38) and a minimum frequency of 30 (equivalent to 30 occurrences per million words), and then sorted by decreasing %DIFF. Items dominated by anonymisation codes (e.g., [PERSON] or [LOCATION]) were removed. WordSmith Tools (Scott, 2017a) was used to check that CKIs occur in at least 23 texts (comprising 0.5% of the corpus); while this is a relatively low proportion of the texts, we were aware of the disparate numbers and lengths of texts across text categories and did not want to exclude CKIs from texts which are greater in length yet fewer in number (i.e., assessment reports; see Table 1). Items occurring in texts from fewer than five social workers were excluded as our focus here is on typicality rather than idiosyncrasy (cf. McEnery, 2018).
Finally, a break in the effect size measure was used to reduce the number of items for further exploration; a relatively high break point was selected as many CKIs are abbreviations of professionals and services within social work and we wished to further populate other thematic categories. The resulting 226 key items were then manually categorised into thematic groups using an iterative process of analysing concordance lines and collocate lists combined with researcher close reading of text extracts (see Table 3). WordSmith Tools was used for this task, as it has greater functionality for examining concordance lines. Each thematic set of items were searched for as a group in WordSmith to check that they form a cohesive category. In deciding category names and assigning key items, we drew on our awareness of social worker discourse from the wider WiSP project, including interviews with social workers (n=70) and observation weeks (n=10).
Following our initial researcher-driven categorisation, we asked 11 expert insiders, comprising social workers, social work educators and representatives from professional bodies, to comment on the thematic groupings. In a group meeting and in follow-up consultation with two social workers, we provided examples of key items used in context, and adapted the groupings according to their insights.

Results
This section details the key items extracted from the WiSP corpus, presenting these within eight broad thematic categories, and then uses concordance lines, clusters and our collaboration with expert insiders to describe the four principle categories. All linguistic examples throughout the paper are drawn from across the three main digital text categories in social work (see Table 1) and also across the three social work domains (see Table  2); any dominance of a lexical item in a single text category or domain is commented on.

Key items in WiSP
The substantive part of the discussion in this paper rests on the manual thematic categorisation of key items as described in Section 3.3. This is supported by use of additional corpus techniques such as searching for frequent 4-word clusters using WordSmith Tools.
Using the clusters feature within the Wordlist tool in WordSmith Tools, we explored 4 word clusters and p-frames for examples of formulaic language within social workers' writing. Here, we use Scott's (2007b) definition of cluster as 'a group of words which are found repeatedly together in each other's company, in sequence'. P-frames or phraseframes are also searchable within WordSmith Tools and follow Fletcher's definition of 'groups of wordgrams identical but for a single word' (2007; here 'wordgrams' refers to clusters). The largest topic area of clusters is that of reporting on communication, explored in Section 4.4. The analysis and discussion is additionally underpinned by automated semantic tagging provided by Wmatrix software. Wmatrix on the whole confirmed our manual analysis as it suggested that dominant semantic categories in WiSP (when compared to BE06) are telecommunications; polite (due to thanks in emails), social actions, states and processes; people; and time. These Wmatrix semantic categories largely map on to our findings from manual categorisation (e.g., the polite category falls within our Interpersonal grouping). Additional semantic categories not revealed through the manual keyword categorisation but highlighted by Wmatrix include safe (with many references to safe, safety and refuge), non-existing (dominated by the words missing and unavailable to refer to either people or documentation) and worry (considered in Section 4.5).  Searches for 4-word clusters and use of Wmatrix thus both supported our researcherdriven analysis, providing a form of triangulation, and also highlighted further areas for consideration. A total of 226 key items (single words and multi-word units) were extracted from the WiSP corpus and manually categorised into thematic groups (see Section 3.3). Some categories are further divided; for example, items within the broad category People, roles, activities and services are broken down into three groups. The rows in Table 3 are organised in order of each category's appearance in the ensuing discussion; each table cell of key items are alphabetised. Contact*, care, and regards are each placed in two categories, as each sense of the term accounts for at least 30% of usage in the corpus. Safeguarding and needs are in two categories, as expert insider views and further examination of the corpus suggest they could belong to either category. The Appendix contains a full list of the abbreviations used in the table.
The key items in the table, taken together, signal the core 'preoccupation' of social work writing since they occur statistically more frequently in social work than in a general reference corpus, and are reasonably well-dispersed across texts and individual writers. The first four categories appear more straightforward and are briefly considered below.
The prevalence of numbers in the first category in Table 3 is due to the frequency with which times, dates and costings are given in social work texts: documenting arrangements and interventions are fundamental to instigating action and pushing a case onto the next phase. The Interpersonal category is populated by greetings and sign-offs from emails. Textual linking refers to items used to connect discourse (this sense of regards is from the multi-word unit with_regards_to). Items in the Miscellaneous category do not fit neatly elsewhere; for example, won't is used in a broader array of meanings than unable, and thus not placed in the Evaluating group.
Sections 4.2 to 4.5 explore the four main categories which contain most of the key items: People, roles, activities and services; Describing and evaluating; Communication; and Perspectives. While the categories are presented here as discrete entities, there are clear overlaps, for example Perspectives could fit within Evaluating. The overarching aim of the categories is to group and thereby try to articulate and understand the core discourse of social work writing as constituted by lexical items.

People, roles, activities and services
This category aims to offer a broad categorisation of the range of social worker roles, activities and services and is subdivided into Social workers, Other professionals and Family and institutionalised carers. This grouping of key items illustrates the number of people involved in social work cases, and the many activities and services offered. Thus a social worker might arrange aspects of care ranging from unannounced visits, a child protection care order, an assessment to enablement support, undertake care proceedings and liaise with medical and mental health personnel (GP, nursing, CAMHS), as well as counselling, educational and legal professionals. Abbreviations are widely used to refer to different branches of social work and connected professions. The extensive lexis relating to children in this category reflects the greater number of texts from project participants within children's services.
Lexis concerning Family and institutionalised carers is key mainly in connection with arrangements around adoption and fostering placements and guardianship. Kinship terms are primarily those used by social work professionals such as birth mother and unborn (an unborn child is usually referred to as unborn + [family name]).

Describing and evaluating
The social worker's role necessitates the extensive detailing of events, situations, homes and relationships, with descriptions and evaluations based on visits, observations and interviews. Key to social work practice is assessing levels of risk, shown through analysis of the significance of descriptive details and evaluation of the meanings and consequences. While in Table 3 we have mapped lexis where the social worker employs apparently neutral language in the Primarily descriptive category (e.g., invoice, meals) and mapped more obviously evaluative language separately (e.g., misuse, struggles), few of these terms can be straightforwardly characterised as solely descriptive or evaluative. There are many grey areas here: for example, assault could be construed as a factual account taken from a police report (e.g., 'the charge was classed as common assault') or could be used in a potentially evaluative manner (e.g., 'both deny any assault has taken place'). The example of assault illustrates how words are indexical as well as referential, and therefore the extent to which keywords can be categorised as descriptive or evaluative may in part depend on who is using them and in what context. The three researchers' familiarity with the writing context through interviews and observations enables us to begin to explore indexical meanings beyond the key item analysis and cotext available through the corpus (e.g., see toxic in Section 5.1).
Our discussions with expert insiders also allow us to move beyond the corpus, shedding light on lexis as discourse in use. Following these discussions and further examination of concordance lines, a few key items were recategorised (e.g., needs [verb and noun] and routines were moved from their initial classification of Primarily descriptive to Primarily evaluative).
For several evaluative terms, the key item brings with it a layer of criticality through lexical priming (Hoey, 2005) carried from common contexts of use (cf. the notions of semantic prosody and semantic preference). Lexical priming theory states that each lexical item is primed for language users and that each new linguistic encounter with an item adds to or confirms our knowledge of how the item is used. Excerpts 1 and 2 below invoke a negative priming by selecting the word incidents, as these are generally negatively-viewed events which require intervention and comprise part of the language around risk. Additionally, use of incidents appears to index the field of police reports.
(1) I contacted [SCHOOL] School who have reported numerous incidents, some requiring physical intervention. [Children's, Casenote] (2) Just if you have any concerns or if there have been any incidents or complaints from relatives or residents [Adult generic, Email] Lexical priming was also prevalent around the word exploitation, underscored by expert insiders (from children's services) who pointed out that this word signals child sexual exploitation (CSE) in this social work domain. Through our investigation, it became clear that evaluation at the level of lexis is not confined to particular sections of casenotes or reports (e.g., those explicitly marked as evaluative, with headings such as Assessment or Analysis) but is threaded throughout texts.

Communication
In addition to the key items in Table 3, frequent 4-word p-frames are considered in this section. The extensive items in the Communication category signal the centrality of all types of communication in social work practice with some lexis indicating spoken inter-action has taken place (e.g., call back, discuss, phone call, voicemail) and other key items signalling communication in writing (e.g., chronology, email, paperwork, uploaded). Any oral communication such as phone calls to other service providers still necessitates a written casenote to record that the communication took place, giving rise to the oft-quoted mantra of social work management: 'if it's not written down it didn't happen' (see also Lillis, Leedham and Twiner, 2017). The past tense verbs in both key items and clusters in this category (see Table 4) are generally procedural and relate to communication and arrangements, most often occurring in casenotes (e.g., contacted, informed, let (someone) know, requested).
Notably, examination of extended concordance lines reveals that social workers refer to their own communication differently to that of service users. Thus social workers advised, confirmed, discussed, enquired and requested where service users or their family members stated and, in discussion with social workers, agreed. In particular advised is used where the social worker is providing information to the service user rather than for the prototypical purpose of dispensing advice. Excerpt 3 illustrates the use of advised and stated in reporting a conversation between a social worker and a service user's wife.
(3) [ Use of reporting verbs echo work in medical discourse, such as Anspach's comments on how these 'account markers ' (1988, p. 368) are used differently when reporting the activity of professionals (e.g., observe, note, find) to that of patients (e.g., claim, report, state, deny). The social worker's default reporting verb appears to be advised and signals a neutral offering to the service user. In social work writing, stated and other verbs accorded to the service user or family are sometimes followed by a quotation (direct or indirect) from a service user or family member, providing evidence to support the writer's commentary, showing the speaker's strength of feeling and also perhaps showing some distance between the writer and the words quoted (see Excerpts 4 to 6). In Excerpt 8, the question and answer are part of an extended dialogue within a mental health assessment in which a social worker checks whether a vulnerable adult can remember personal information.

Perspectives
This category could be subsumed under Evaluating, but given the importance of representing participants' perspectives textually (notably those of the social worker, other professionals, service users and their family members) we wished to highlight the ways in which different viewpoints are reported. The categories in Table 3 are necessarily overlapping, as much social work writing could be regarded as evaluative.
Although the Perspectives category is populated by only six key items, it is an important grouping as it may signal the importance attached by social workers to providing service user perspectives in written texts. All 39 occurrences of the word adamant indicates the strength of feeling of a service user or family member involved in a case, as described by the social worker (see Excerpts 9 and 10).
(9) Her pupils were extremely dilated although she was adamant she had not taken any drugs.
[ It is likely that social workers' common referencing to wishes and feelings is prompted by headings in templated forms (see Excerpt 18), an insight which came about through our close reading of such forms (headings are tagged and excluded from corpus analysis but examined in close reading of texts). In contrast to the more emotive words ascribed to service users and their families, the more detached key item concerns is almost exclusively attributed to professionals and mainly within the domain of children's services, whether the social worker themself, health or educational professionals, the Local Authority, or a general passivised professional voice (see Excerpts 19,20). Frequently, writers reference the absence of concerns with the most frequent cluster around concern being there are no concerns (65 occurrences). In the small number of cases where concerns is attached to a service user, this appears within a comment on their absence of concern.
(24) The issue is that the parents do not share the concerns of the local authority and have been reluctant to engage. [Children's, Other: Casefile audit form] Wmatrix semantic tagging (see also Section 3.3) reveals the category of worry to be more frequently used in WiSP than in BE06, and our exploration of social worker and service users' feelings and psychological states was broadened out to consider the wider lexis. The semantic domain of worry 'includes a total of 2,640 individual lexical items in WiSP, ranging from concerns (997 occurrences) and worried (172) to items occurring only 2 or 3 times (e.g., troubled, nuisance, bothering, on_edge).
In general, worrying is done by a third person he/she/they/ [PERSON]. A common question on assessment forms is 'What are you worried about?' but this is followed by either a factual statement or by expressions of concern rather than repetition of the lexical item worried: It appears that while social workers report their own views formally and objectively, giving evidence through service user quotations, service users' perspectives appear to be reported in terms of their emotions. Thus, the social worker has concerns, whereas the service user and their family express feelings.

Discussion, reflections and conclusion
The principal aim of this paper was to uncover the core preoccupation of social workers' texts as evidenced by corpus-assisted discourse analysis. Exploration of key items in Section 4 has illustrated how social workers are required to liaise with many other professionals across a range of services, and how evaluative language is threaded through texts in the use of lexis. Communication is highly important as social workers have to report what they have observed, said, heard and done at each stage in a case. Different reporting verbs are used for social workers compared to service users and their families, with direct speech from the latter used to provide evidence for social worker evaluations. The perspectives of each group are also reported differently in terms of either concerns or feelings. Writing is how a case is progressed, through completing requests for services and alerting other professionals of requirements.
Although the corpus analysis here, as with much linguistic research, produces some results which may appear obvious, it also provides an empirical corroboration of intuitions (Taylor and Marchi, 2018b). The use of extended concordance lines, reading whole texts and corroboration through expert insiders enables us to dig deeper and go further than the initial obvious findings into the 'non-obvious' (Partington, 2017). For example, while the key items in the communication category initially seemed obvious, further exploration of the co-text reveals differences in the use of reporting verbs, and the use of quotations to provide evidence.
The remainder of Section 5 draws together some overarching aspects of the key items detailed in Section 4, namely, professional discourse (Section 5.1) and evaluative language (Section 5.2). We then reflect on the corpus techniques employed in the investigation in Section 5.3.

Professional discourse
Since the key items considered throughout this paper are those which occur more often in the WiSP corpus when compared to a reference corpus, it could be argued that, taken collectively, they constitute social work professional discourse. However, merely fulfilling the criteria of keyness is insufficient to warrant this status: social workers are more likely to use the words toilet, 11am and thanks than BE06 writers due to the former's work in adult care, focus on arrangements and use of email respectively, yet it would be odd to claim such everyday lexis as part of a professional discourse. Instead, we suggest that it is the use of particular words over other possibilities, as well as lexis constituting the core focus or preoccupation of social work which together contribute to making up professional discourse. The use of items such as wellbeing (over health), and home environment (over home), alongside the core vocabulary detailing what it is that social workers do as part of their practice (e.g., safeguarding, unannounced visit, chronology) constitute the professional discourse. We see the key items in Section 3 therefore as constituting what we consider to be 'candidate professional lexis'; that is, they are a provisional set of items which require further examination before any decision is made regarding their status. The discussion below considers examples of professional lexis, selected from several of the key item categories. Professional discourse is often used in social work to describe service user surroundings. For example, the particular term home_environment (81 occurrences in the corpus) is frequently used where a more commonplace term might be home and serves to signal a specific social worker lens (see Concordance 1). The aforementioned difficulty in assigning key items to the descriptive or evaluative categories is apparent here (see also Section 4.3). A warm/healthy/caring home environment are clearly descriptive but also (positively) evaluative within a context where the overarching goal of the social worker, as indicated in the text, is to judge whether a home is sufficiently well-maintained. (The focus in this paper is on the textual signalling of goals, but we drew on researcher insights from observations and interviews as part of interpreting and classifying these goals; see Lillis, Leedham and Twiner, 2017) In contrast, the premodifier toxic (see line 5 in Concordance 1) is clearly (negatively) evaluative. This instance is the sole occurrence of toxic in the corpus, but is a powerful and emotive description which colours the whole text and may cause the social work reader to think of the commonly-used phrase 'toxic trio': of substance misuse, mental ill health and abuse (as suggested by ethnographic observation notes).
Marked examples of professional discourse from the primarily evaluating category include engaging and manage. By marked we here refer to lexis which appears to be used differently in social work discourse to more everyday usage (e.g., the items index, social worker, risk assessment). Commenting on whether and how a service user is engaging with the support offered and whether they manage in day-to-day life are key to social worker risk assessment (Excerpts 27,28 Judging levels of engagement is difficult, as it could be that the social worker did not engage well with the service user, rather than a failure on the SU's part. Moreover, particularly in children's services, by the time social services are involved any engagement is unlikely to be voluntary (as attested by interview and observational data). Within professional discourse, social workers' use of particular lexis appears to enact a level of formality. For example, reside/residing is at times used rather than live, ascertain rather than find out and utilise rather than use: Whereas reside/residing and ascertain occur across both adult and children's services, the key item utilise is almost entirely confined to adult care. Common collocates of utilise are nurse call (system) and wheelchair, and, from close examination of the extended co-text, appears to be employed to provide a measure of dignity and respect towards the service user. Many insider terms and abbreviations appear in the key items list as candidate professional lexis which may appear obscure to those outside social work. Controversy exists in social work around the use of what has been termed 'jargon', with calls for the word placement to be replaced by home and so on (e.g., Surviving Safeguarding, 2018). The issue of social work terminology arose in our consultation with expert insiders where we discussed the usefulness of professional language which succinctly encapsulates a range of meaning (e.g., educational establishment rather than nursery/primary/secondary/specialist schools and colleges) versus the potential for distancing the service user from social services through use of words such as contact rather than family time. The contribution of corpus analysis is to help to make visible the language within this ongoing debate, and we recognise there is a need for further exploration of candidate items through close reading and discussion with expert insiders.

Evaluative language
Throughout the examples used in this paper and the ensuing discussion, a common social work theme has been that of risk with assessing, documenting and managing risk a core task of social work. Key items directly related to risk include concerns, high_risk, safeguarding, suffer, abusive, allegation, CSE and deteriorated. The notion of risk is also apparent in many of the texts we explored, as is shown through its prevalence in our examples.
The issue of what counts discursively as description or evaluation is an ongoing point of debate within social work, reflecting the complexity of the relationship between discourse and professional positionality and status. In exploring the use of evaluative language, we suggest in Section 4.3 that evaluation is threaded throughout social workers' writing rather than confined to particular sections such as those labelled Analysis within casenotes and assessment forms. Evaluation may also be apparent through the use of different reporting verbs to convey service user and professional voices, and through the different lexis used to convey perspectives. In the written record of social work practice, the analytical and evaluative aspects of social work are backgrounded and often rendered invisible, yet lack of analysis is a constant criticism (e.g., Stevenson, 2017).

Reflections on methodology
In this section we offer reflections on some aspects of methodology, considering the subjectivity around keyword calculation; the equal treatment of all items in a corpus; and finally the usefulness of combining keyness analysis with further data sources. While keyness provides a systematic linguistic analysis of qualitative discourse data, the 'identification of an item as key depends on a multitude of subjective decisions' and it is important that these decisions are 'both principled and explicitly stated' (Gabrielatos, 2018, p. 253). Once a set of candidate key items has been extracted using corpus software, the interplay of statistical and effect size measures alongside frequency thresholds (e.g., of texts and writers,) are commonly used to filter the data to a manageable level. The stipulation here that key items occur a minimum of 30 times, across 23 texts and by five social workers (see Section 4.3) means that lexis occurring mainly in adult services (as opposed to children's) is less likely to be deemed key as there are far fewer texts. For example, bowels was an initial CKI based on Bayes Factor, %DIFF and frequency, but occurs in just 19 texts and so was excluded. The generally arbitrary cut-offs used thus affect which key items are available for categorisation and later interpretation (cf. the discussion in McEnery, 2018).
A point rarely acknowledged in corpus linguistics or CaDS is that in a corpus -unless complex statistical weighting is applied -all texts are treated equally, regardless of the number of readers or the longevity of the texts. Thus, a corpus of general English may include official or far-reaching government letters and accord these the same status as (usually) ephemeral SMS messages to a single reader. In the case of WiSP, a concordance line from a carefully crafted court report read by many has equal status to one from a hastily-written email to a single reader. In the WiSP casenotes subcorpus, critical incident casenotes have the same impact as more mundane accounts of phone calls made. Each occurrence of a word increases the likelihood of that word being deemed frequent in the corpus or key when compared to a reference corpus. Some account can be taken of this point by setting thresholds for the number of texts and writers, through the use of Wmatrix to include infrequent lexical items within semantic groupings, and through close reading to judge the impact of infrequent yet powerful lexis (e.g., toxic).
Triangulation was provided in this study by employing Wmatrix semantic tagging, as this confirmed the previous categorisation and also suggested areas for further investigation such as the semantic domain worry. The additional exploration of 4-word clusters, although relatively minor in scope, also helped us in our quest to look through as many windows as possible (Taylor and Marchi, 2018b) within the bounds of researcher time and article word-count. In addition to these supporting procedures, we argue that the use of key items is made more robust when the initial researcher-driven thematic categorisation is iteratively combined with expert insider views. Combining corpus data with ethnographic insights within the WiSP project and consultation with expert insiders helps to ensure that researcher intuitions in categorising keywords and interpreting findings are grounded.

Concluding points
This paper has described the compilation of the WiSP corpus of social worker texts. The sensitive nature of the texts means it is extremely time-consuming to both gain access to and anonymise texts. Due to the variety of texts in the corpus, we cannot claim the WiSP corpus is representative of social workers' writing. Despite this caveat, the WiSP corpus represents a significant first step in corpus creation for social work writing, extending the range of texts available to researchers. The WiSP dataset has been deposited in the UK Data Archive, safeguarding level (Lillis, Leedham & Twiner, 2019) The employment of CaDS methodology has enabled us to combine computational extraction of key items with researcher and expert insider perspectives and to explore this candidate professional lexis, revealing both obvious and less obvious aspects of professional social work discourse. Making language visible in a systematic way will, we hope, add to discussion in the field of social work around the core focus of the profession and around what constitutes professional language. Future possible uses of the corpus include a comparison of WiSP texts written by experienced and less experienced social workers.
Beyond social work, it would be fruitful to look at different professions and explore overlaps in professional discourse.

PLO
public law outline PO purchase order