Producer:
Memex Technology Limited
Modules:
Crime Workbench
Server (includes Memex
Intelligence Engine);
Crime Workbench
Client
Crime
Workbench integrates with:
Data Entry Module,
i2 Analyst’s Notebook
5.0.25 & i2 Analyst’s
Notebook 6.
Crime
Workbench (CWB)
developed by a British company Memex Technology Limited in co-operation with
ACSYS BSC allows creation, management and use of textual databases employed in
criminal analysis process. It is particularly useful in multi-thread cases
covering large territorial area, with complex structure of criminal associations
and large amount of information, where you are not capable of following and
associating facts using traditional or less sophisticated methods, in order to
build and then verify or eliminate investigation hypothesis. Main system
features are as follows:
-
database creation and management;
-
collecting descriptive information about
entities and events;
-
creating structural information about persons
and other entities (address, vehicle etc.);
-
defining links that show associations between
entities;
-
management of complex cases;
-
searching for information, based on extensive
query language;
-
criminal analysis with the use of visualization
techniques (full use of the graphical analysis functions requires installation
of CWB i2-Plugin
extension module, and the i2
Analyst’s Notebook;
-
managing user access to the gathered
information;
-
extensive data access monitoring function;
-
action management;
-
automatic import of data from external
information storage systems.
All system features are available from the
Crime Workbench
Client application level.
Crime
Workbench is an operational and strategic
intelligence tool allowing powerful data storage, searching, retrieval and
analysis. It is specifically designed for the law enforcement, intelligence and
fraud investigation communities.
Crime Workbench is
a client–server system. The clients can use the resources of one or more
Crime Workbench
servers. The server software can run on a Solaris platform (Solaris 2.7 or
Solaris 2.8).
There are two client programs for connecting to the server:
The client application provides a user interface to the
powerful search facilities of the
MIE (Memex
Intelligence Engine). The
MIE is installed
on the server together with the Memex databases in which all of the intelligence
data is stored.
System architecture
Crime
Workbench installation uses a combination of
physical and logical servers. The most common type of installation is a single
physical server supporting several logical servers.
The main types of installation are the following:
-
Centralized – This is a single-server
installation. It consists of one physical server supporting one logical server
with one set of data.
-
Logically Distributed – This is the most common
type of installation and consists of one physical server with multiple logical
servers.
-
Physically Distributed – In specific
circumstances it is useful to have a system that is made up of more than one
physical server. Each physical server contains one or more logical servers to
which users can write, plus read-only copies of the logical servers on the
remote physical servers. (Each physical server on which
Crime Workbench is
installed must also host an
MIE installation.
The MIE handles
most of the database management tasks for
Crime Workbench.)
Crime Workbench Components
Crime
Workbench is a client–server system. The client
application is a Windows-based program that provides users with a graphical
interface to the functions provided by the server.
On the server side,
Crime Workbench
installation comprises one or more logical servers. The logical servers can be
grouped into two types:
Configuration server
Every
Crime Workbench installation requires one configuration server. The
configuration server is a special instance of a logical server – as well as
holding intelligence data, the configuration server stores information relating
to the configuration of the system (e.g. user settings and entity definitions).
It is also the point of entry to the system during the login process.
The configuration server contains:
The appserver (application server) performs certain
server-side tasks for the client application – for example, adding and removing
users from the system, assigning permissions to users, and changing passwords.
These databases (mxAction, mxAuditnn, mxCase, mxDisseminate,
mxEntity, mxLinkDB, mxServer and mxUserGroup) contain a variety of system
information. With the exception of mxDisseminate they are not displayed in the
client application’s Search Manager and so cannot be searched in the same way as
intelligence databases.
The information input by users, via the
Crime Workbench client application, is stored in intelligence databases, such as the Report,
Address and Vehicle databases.
Non-configuration servers
Crime
Workbench installation may contain multiple
non-configuration servers. Using non-configuration servers allows you to have
more than one database of the same type. For example, if you want to have
several Address databases, you must put each one on a separate server.
Each non-configuration server contains:
Typically, though not necessarily, the same selection of
databases as those on the configuration server.
Only the mxAuditnn and mxDisseminate management databases are
included (depending on a user’s permissions). All of the configuration data is
centrally located on the configuration server.
The archive server is a special type of non-configuration
server, used to store deleted records. If an archive server is installed,
whenever a user deletes a record, it is moved to the matching database on the
archive server. For example, if you delete a record in an Address database it is
moved to the Address database on the archive server. (Typically, access to the
archive server is only given to administrative users.)
Introduction to Memex
Intelligence Engine
The
Memex Intelligence Engine is a unique technology that has been developed
specifically to meet the needs of organizations involved in law enforcement,
national security and combating commercial fraud. It is neither a relational
database management system nor a simple text indexing system, but a secure
hybrid design with the benefits of both of these technologies.
The
Memex Intelligence Engine is the backbone for Memex’s
Crime Workbench.
Behind the scenes, the
MIE provides the
majority of the data input, search, retrieval and database management functions
for these applications.
Structured and free-text
database systems
Today’s databases are primarily relational systems, based on
the concept of highly structured data. These systems are very prescriptive about
the types of data you can store within them and the methods you can use to
access the data efficiently. As an example, records about a person will allow
only one date of birth, and will not allow a partial or approximate date of
birth. Typically, data can only be searched by some of the fields and little or
no “fuzzy” searching is available.
By contrast, indexed, free-text systems allow you to search
for any information within text documents. However, these systems also have
their limitations. For example, the data must be completely indexed – an
operation that takes some time. This prevents information from being searchable
immediately on entry. Wholly free-text systems also offer no support for
applying structure to data where you do require it, and little, if any, support
for data security.
The
Memex Intelligence Engine enables you to provide as little, or as much,
structure to your data as you need. It allows you to search every part of your
data without the need to define indexes, and it provides a framework of security
and auditing for all functionality.
Basic concepts of Memex
Intelligence Engine
Instead of each character having a numeric code between 1 and
255, as in the original ASCII text, the
MIE converts
incoming text data into numbers, through a process called tokenisation. Each
unique word, text unit (such as an acronym), symbol or punctuation mark in a
database is treated as a token. Each token is given a numeric code, which is
assigned the first time the token is added to the database. The tokens are
stored as variable-length bytes, with most tokens being assigned either a 1-byte
or 2-byte code. In the default configuration, numbers in the incoming text are
stored as numbers, rather than as tokens.
Additional information is stored about capitalization,
document breaks, field separators and numbers. This information is also
tokenised.
Once data is stored in coded form in a Memex database it can
be searched very rapidly. The
Memex Intelligence Engine
searches for the small byte sequences that represent the required words. This is
a far quicker process than searching the original raw text, and is aided by the
fact that using tokenisation compresses the original text by up to 70 per cent.
To further improve performance, the database is split into
separate files called segments. Data is indexed to segment level, so that the
MIE can quickly
determine which segments contain a given word, or combination of words. This
allows the MIE to
limit its search to those segment files that contain the word, or words, in the
user’s search expression. The segment index (which is also known as the navigate
and search map, or, more simply, as the map file) allows rapid elimination of a
great deal of data, and means that typically only 5–10 per cent of the data
needs to be scanned. Additionally, when a database does not contain the word or
phrase being searched for, the
MIE can determine
this without having to search any of the coded segments.
How the system is
organized
The
Memex Intelligence Engine and the applications it supports are organized
in a client–server relationship. Search facilities and data access controls are
provided to client applications – such as
Crime Workbench
and CWB i2 Plug-in – by a set of server-side programs. The
MIE can also be
accessed by custom-built solutions, using one of the programming APIs.
The
MIE itself consists of a central server program and a suite of programs
and utilities that carry out specific tasks. Data is stored in one or more Memex
databases, which are typically located on a single physical server.
Client applications connect to the
MIE via TCP/IP,
specifying the server’s host name and the network port on which the
MIE listens for
connections. Typically, the
MIE uses port 590,
although another port can be chosen, if required.
The Memex server
The
MIE runs as a two processes called aisvr and bcp. The tasks performed by
the Memex server fall roughly into two categories:
Controlling all database access, input and maintenance on behalf of client
applications. Where appropriate, the Memex server also provides user
authentication and handles the connection of networked applications.
Performing all search operations on Memex databases. After
processing the search commands it receives from client applications, the Memex
server runs the appropriate search, provides information on the status of the
search, and returns any hits it finds.
The Memex server is usually set up to start automatically
when the computer hosting the
MIE is rebooted.
This is achieved by using a startup script, on a UNIX server.
Other parts of the system
An installed
MIE system
consists of a large collection of programs and files. The following list briefly
describes the main constituent parts of the system – other than the Memex server
and its configuration file – and indicates where you can find more information.
Databases
Memex databases comprise a number of files, including:
One or more coded segment files (often referred to as “cod”
files):
-
database configuration file, containing
definitions of fields, level separators, the security classification for the
database, etc.;
-
vocabulary file;
-
map file;
-
database registry.
The
MIE’s Registry file is a text file that can be used to identify and
organize the databases that are available to an application.
For each database, the registry contains such information as:
-
creation date/time of the database;
-
host name of the computer on which the database
is located;
-
path to the directory in which the database
files are stored;
-
unique number that applications use to identify
the database, rather than using the path, thereby making it easier to change the
physical location of the database.
Security file(s)
Security is applied to data using a system of locks and keys.
The locks, together with the specification of which keys each user possesses,
are defined in a security class, within a security file. Each database is
assigned a security class, from which it takes its security settings.
Applications may use a number of security classes, stored in one or more
security files – for example, assigning a particular security class to a
particular type of database. Alternatively (as is the case with
Crime Workbench ),
an application may use a single security class in a single security file.
The file path and name of the security file is specified as
the Memex server’s configuration parameter.
Security journal
The security journal is a low-level log of database events.
It is a text file, but identifies logged events using a code, to keep the size
of the file to a minimum. The file path and name of the security journal, and
the events it logs, are specified as the Memex server’s configuration parameter.
Error log
The
MIE logs all errors in a text file. The file path and name of the file
are specified as the Memex server’s configuration parameter. The error log also
records configuration information about the system each time the Memex server is
started. For debugging purposes, the Memex server can be configured to log all
Memex API function calls to the error log.
Query expanders file
This text file allows users to access functions directly from
the query line, rather than via an application control. For example, the
function mx_phonetic can be mapped to the word SOUNDS, allowing a search such as
(simpson)SOUNDS.
The file also maps Boolean operators to words. This allows
users to enter a query such as dog NOT spaniel, instead of dog ! spaniel. The
file is also used to specify the file path and name of the default thesaurus
file.
The file path and name of the query expanders file are
specified in the Memex server’s configuration file.
Thesaurus file
The
MIE can search for words that have a similar meaning to a specified word.
In order to perform synonym expansion, the system must have access to at least
one thesaurus file. Each database can have a thesaurus assigned to it by
specifying the path to the thesaurus file in the database’s configuration file.
If a database does not have an individually assigned thesaurus, a default
thesaurus is used. The default thesaurus is specified in the query expanders
file.
Link repositories
A link repository is a special kind of database, designed for
storing information about links between records.
Crime Workbench ,
for example, uses three link repositories for storing the details of explicit,
implicit and case links. Link repositories use a proprietary format and the data
they contain cannot be modified in any way, other than via the
MIE.
Query histories
A query history is a collection of information concerning
searches run by a user. Each query history entry stores the details an
individual search, including the results returned by that search. Each user can
have one or more query history, created and maintained by a client application.
Typically, query histories are created in a directory used for temporary files –
for example, /tmp.
Utilities
The
MIE comes complete with a set of tools for manipulating or repairing
databases, loading/dumping data to/from a link repository, and adding query
expanders.
|