Data Manipulation – batch tracing the RiO Unknown GP Report
This note
illustrates some data manipulation techniques which could be of use to
colleagues in a variety of situations. The example here describes the
preparation of a file for submission to the Demographics Batch Service
from the RiO ‘Unknown GP’ report.
The report is
produced by RiO as a web page. Whilst the data is presented in a
tabular form, it does not readily lend itself to copy and paste into
Excel, say.

In
Internet Explorer, right-click on the page and choose “view
source” – this presents the source code of the web page in
Notepad. We can see that the data in the page is presented in XML,
preceded by a style sheet which informs the browser how to display the
data. The header and style sheet information can be deleted, from the
start of the file up until the first <ROW> tag.

Scroll to
the end of the file where you will find a bunch of javascript before
the page trailer. Delete everything after the last </ROW> tag.
This leaves you with the data in raw XML.
Add a
header and trailer and save the file with the extension
“.XML”. The header and trailer are illustrated in the
example data which can be downloaded here (right click, save and open in notepad): UnknownGP.xml
Now
prepare a file with the extension “.XSD” in which you
describe a simple database schema to which your XML file conforms. The
content of the file should be self-explanatory; each element describes
a field and the field names must correspond to the XML tags in the data
file. The file names must also agree, e.g. “data.xsd” and
“data.xml”. The schema file can be downloaded here: UnknownGP.xsd
Now open
an Access database and choose “file”, “import”
from the main menu. Select “xml” as the type of file to
import, and pick the file which you have prepared. Provided you have
done everything correctly, the data should be imported into a table
called “ROW” with field names corresponding to the XML
tags. Don’t worry that the field called “surname”
contains no meaningful data – the client last name is presented
in a field called “C2”.

From your
Access table you can export to Excel or CSV formats. However for the
purpose of batch tracing some further manipulation is required. The
columns have to be in a different sequence, and date and gender in
different formats. We can achieve these transformations by creating an
Access query.
I used
the Access query wizard to select the columns I wanted in the desired
sequence. Then in the query design view, I edited the SQL query to add
extra blank columns by inserting <<“”,>> as
illustrated. To alter the format in which date and gender are
presented, I used VBA formulas embedded within Access. The
resultant query is shown below.
SELECT "10", ROW.C2, mid(ROW.C6,8,4) &
format((instr("~Jan~Feb~Mar~Apr~May~Jun~Jul~Aug~Sep~Oct~Nov~Dec",
"~" & mid(ROW.C6,4,3))+3)/4,"00") & left(ROW.C6,2), "", "", ROW.C3,
ROW.C5, "", ROW.C4, "",
format((instr("~Male~Female","~" & ROW.GENDER)-1)/6+1,"0"),
ROW.ADDRESS, "", "", "", "",
ROW.POSTCODE, "", "", "", "", "", "", "", "", "", ""
FROM ROW;
Run the query
and export the results to a CSV file. You should find that Access
automatically converts from Unicode to ANSI as part of this operation,
halving the size of the file in the process.
Lastly,
edit the CSV file in notepad and add a header and trailer in the format required
by the DBS. Descriptions
of the file formats used by the DBS can be found at http://www.connectingforhealth.nhs.uk/industry/docs/dbs/index.html?dbsoverview.htm .
The file produced by the Access query requires a header and trailer to
be added. These can be created using Excel formulae as illustrated
here: DBS-Header-and-Trailer.xls
Having
successfully submitted the data for tracing, a file should be returned
with the results. One way of dealing with the returned data is to load
it into Access and write a simple query and report which will present
the data in a manner convenient to your data entry clerks.
Processing a file returned by the DBS
Adding GP information
The data returned by the DBS contains only practice codes, not GP
codes. Unfortunately, when updating client registration details in RiO
you cannot find a GP by practice code. To make the batch tracing useful
to RIO data entry clerks, other details of the GP practice need to be
included in a report.
Current details of GP's and other administrative code information is
available in the orgcurrdev2003.mdb database which can
be downloaded from Connecting For Health ODS at:
http://nww.connectingforhealth.nhs.uk/ods/downloads/access/
From this database, the GP practice table can be exported as illustrated below.

I exported GP practice details into a CSV file, loaded this into Excel
then saved as a Excel spreadsheet, then imported it into my Access
database as table "GMP". The objective here was to join the response
file from the DBS to the GP practice table, in order to include GP
practice name and address information in the report produced for the
data entry clerks. The clerks can then look up GP practices by postcode.
I produced software in Visual Studio 2008
to interrogate the database and produce a report for data entry clerks.
The database and software is available for download here; these are zip files so right-click and save:
To install the system, save the database in C:\DBS_Response as DBS_Response.mdb
Populate the GMP table with the data exported from the ODS download.
Install the application DBS_Results. This program will interrogate the
database and output the results to text file "DBS_Result.txt" in your
"My Documents" folder.
To handle a file returned by DBS:
- Double-click to open in Excel
- Delete the first and last row
- Save as an Excel workbook in folder C:\DBS_Response
- Import the Excel workbook into Access, accepting the defaults
- Save the imported table as DBS_Response, overwriting the existing table
- Run the DBS_Results program
Note - all illustrations use Office 2003 under Windows XP.
Updated: 18 July 2009