Data Cleansing for Child Health
The data cleansing exercise was based on two approaches; firstly
a comparison was made,
using in-house developed software, between all 18-and-under clients on
RIO and all18-and-under patients on local GP registers. This
exercise, known as the "Data Matching Report" used a download of
Hounslow population from the NSTS. Secondly the
standard "Report of Clients with Unknown GP" was extracted from the RIO
system and submitted for tracing against the spine through the
Demographics Batch Service (note 2).
The
diagrams below illustrate how this data is being handled, and where
various software utilities or data manipulation processes are used -
for example to render information suitable for processing by the DBS.


The Data Matching Report produced information broadly divided into
three categories: "missings" who appear on RIO but not on a local GP
list, "adds" who appear on a GP list but not on RIO, and "mismatches"
which appear on both but differ in some way such as, for example,
having the same NHS number, name and date of birth but apparently
registered with different GPs. These "mismatch" records are manually
corrected on RIO to match the GP list.
The
"adds" were added to the RIO system, and the "missings" were submitted
to the DBS for tracing (note 1). The results from the DBS were then
manually input to RIO.
Note 1: Of 12,243 records classified as "missings" and traced through DBS,
7,638 produced matches, 4,555 were not matched, and 50 produced
multiple matches on the spine. Of the 7,638 records which were matched
on the spine, only 6,568 returned a GP practice code. A proportion of
these records return a Hounslow GP code, however in some of these cases
the local GP practice may have informed the PCT that the child is no
longer on their list.
Note 2:
Unlike the Data Matching Report, tracing from the Unknown GP report was
not limited by age. The data was submitted in two passes. In pass 1
12,992 records were submitted and 8,997 records were traced
successfully with their NHS number found to be valid, but only 8,153
returned GP codes. It should be noted that, since it is not
age-limited, the data includes a proportion of persons who have died.
71 records produced no result. On pass 2, 3,932 records (those with no
NHS number) were submitted for tracing. Only 1,377 were traced, 1,154
returning GP practice codes. Where a record does not include an NHS
number, it is recommended to not include any address information with
the records submitted to DBS.
PT 17 July 2009