Data Manipulation – batch tracing the RiO Unknown GP Report

This note illustrates some data manipulation techniques which could be of use to colleagues in a variety of situations. The example here describes the preparation of a file for submission to the Demographics Batch Service from the RiO ‘Unknown GP’ report.
The report is produced by RiO as a web page. Whilst the data is presented in a tabular form, it does not readily lend itself to copy and paste into Excel, say.

Rio Unknown GP report

In Internet Explorer, right-click on the page and choose “view source” – this presents the source code of the web page in Notepad. We can see that the data in the page is presented in XML, preceded by a style sheet which informs the browser how to display the data. The header and style sheet information can be deleted, from the start of the file up until the first <ROW> tag.

Editing the source code of the RiO report

Scroll to the end of the file where you will find a bunch of javascript before the page trailer. Delete everything after the last </ROW> tag. This leaves you with the data in raw XML.

Add a header and trailer and save the file with the extension “.XML”. The header and trailer are illustrated in the example data which can be downloaded here (right click, save and open in notepad):  UnknownGP.xml

Now prepare a file with the extension “.XSD” in which you describe a simple database schema to which your XML file conforms. The content of the file should be self-explanatory; each element describes a field and the field names must correspond to the XML tags in the data file. The file names must also agree, e.g. “data.xsd” and “data.xml”. The schema file can be downloaded here:  UnknownGP.xsd

Now open an Access database and choose “file”, “import” from the main menu. Select “xml” as the type of file to import, and pick the file which you have prepared. Provided you have done everything correctly, the data should be imported into a table called “ROW” with field names corresponding to the XML tags. Don’t worry that the field called “surname” contains no meaningful data – the client last name is presented in a field called “C2”.

Importing the data into Access

From your Access table you can export to Excel or CSV formats. However for the purpose of batch tracing some further manipulation is required. The columns have to be in a different sequence, and date and gender in different formats. We can achieve these transformations by creating an Access query.

I used the Access query wizard to select the columns I wanted in the desired sequence. Then in the query design view, I edited the SQL query to add extra blank columns by inserting <<“”,>> as illustrated. To alter the format in which date and gender are presented, I used VBA formulas embedded within Access. The resultant query is shown below.

SELECT "10", ROW.C2, mid(ROW.C6,8,4) &
format((instr("~Jan~Feb~Mar~Apr~May~Jun~Jul~Aug~Sep~Oct~Nov~Dec",
"~" & mid(ROW.C6,4,3))+3)/4,"00") & left(ROW.C6,2), "", "", ROW.C3,
ROW.C5, "", ROW.C4, "",
format((instr("~Male~Female","~" & ROW.GENDER)-1)/6+1,"0"),
ROW.ADDRESS, "", "", "", "",
ROW.POSTCODE, "", "", "", "", "", "", "", "", "", ""
FROM ROW;

Run the query and export the results to a CSV file. You should find that Access automatically converts from Unicode to ANSI as part of this operation, halving the size of the file in the process.

Lastly, edit the CSV file in notepad and add a header and trailer in the format required by the DBS. Descriptions of the file formats used by the DBS can be found at http://www.connectingforhealth.nhs.uk/industry/docs/dbs/index.html?dbsoverview.htm .

The file produced by the Access query requires a header and trailer to be added. These can be created using Excel formulae as illustrated here:  DBS-Header-and-Trailer.xls

Having successfully submitted the data for tracing, a file should be returned with the results. One way of dealing with the returned data is to load it into Access and write a simple query and report which will present the data in a manner convenient to your data entry clerks.

Processing a file returned by the DBS

Adding GP information

The data returned by the DBS contains only practice codes, not GP codes. Unfortunately, when updating client registration details in RiO you cannot find a GP by practice code. To make the batch tracing useful to RIO data entry clerks, other details of the GP practice need to be included in a report.

Current details of GP's and other administrative code information is available in the orgcurrdev2003.mdb database which can be downloaded from Connecting For Health ODS at:

http://nww.connectingforhealth.nhs.uk/ods/downloads/access/

From this database, the GP practice table can be exported as illustrated below.

Exporting Data

I exported GP practice details into a CSV file, loaded this into Excel then saved as a Excel spreadsheet, then imported it into my Access database as table "GMP". The objective here was to join the response file from the DBS to the GP practice table, in order to include GP practice name and address information in the report produced for the data entry clerks. The clerks can then look up GP practices by postcode.

I produced software in Visual Studio 2008 to interrogate the database and produce a report for data entry clerks. The database and software is available for download here; these are zip files so right-click and save:
To install the system, save the database in C:\DBS_Response as DBS_Response.mdb

Populate the GMP table with the data exported from the ODS download.
Install the application DBS_Results. This program will interrogate the database and output the results to text file "DBS_Result.txt" in your "My Documents" folder.
 
To handle a file returned by DBS:
Note - all illustrations use Office 2003 under Windows XP.  

Updated: 18 July 2009