Memex Technology Limited
Server (includes Memex
Workbench integrates with:
Data Entry Module,
i2 Analyst’s Notebook
5.0.25 & i2 Analyst’s
developed by a British company Memex Technology Limited in co-operation with
ACSYS BSC allows creation, management and use of textual databases employed in
criminal analysis process. It is particularly useful in multi-thread cases
covering large territorial area, with complex structure of criminal associations
and large amount of information, where you are not capable of following and
associating facts using traditional or less sophisticated methods, in order to
build and then verify or eliminate investigation hypothesis. Main system
features are as follows:
database creation and management;
collecting descriptive information about
entities and events;
creating structural information about persons
and other entities (address, vehicle etc.);
defining links that show associations between
management of complex cases;
searching for information, based on extensive
criminal analysis with the use of visualization
techniques (full use of the graphical analysis functions requires installation
of CWB i2-Plugin
extension module, and the i2
managing user access to the gathered
extensive data access monitoring function;
automatic import of data from external
information storage systems.
All system features are available from the
Client application level.
Workbench is an operational and strategic
intelligence tool allowing powerful data storage, searching, retrieval and
analysis. It is specifically designed for the law enforcement, intelligence and
fraud investigation communities.
Crime Workbench is
a client–server system. The clients can use the resources of one or more
servers. The server software can run on a Solaris platform (Solaris 2.7 or
There are two client programs for connecting to the server:
The client application provides a user interface to the
powerful search facilities of the
Intelligence Engine). The
MIE is installed
on the server together with the Memex databases in which all of the intelligence
data is stored.
Workbench installation uses a combination of
physical and logical servers. The most common type of installation is a single
physical server supporting several logical servers.
The main types of installation are the following:
Centralized – This is a single-server
installation. It consists of one physical server supporting one logical server
with one set of data.
Logically Distributed – This is the most common
type of installation and consists of one physical server with multiple logical
Physically Distributed – In specific
circumstances it is useful to have a system that is made up of more than one
physical server. Each physical server contains one or more logical servers to
which users can write, plus read-only copies of the logical servers on the
remote physical servers. (Each physical server on which
Crime Workbench is
installed must also host an
The MIE handles
most of the database management tasks for
Crime Workbench Components
Workbench is a client–server system. The client
application is a Windows-based program that provides users with a graphical
interface to the functions provided by the server.
On the server side,
installation comprises one or more logical servers. The logical servers can be
grouped into two types:
Crime Workbench installation requires one configuration server. The
configuration server is a special instance of a logical server – as well as
holding intelligence data, the configuration server stores information relating
to the configuration of the system (e.g. user settings and entity definitions).
It is also the point of entry to the system during the login process.
The configuration server contains:
The appserver (application server) performs certain
server-side tasks for the client application – for example, adding and removing
users from the system, assigning permissions to users, and changing passwords.
These databases (mxAction, mxAuditnn, mxCase, mxDisseminate,
mxEntity, mxLinkDB, mxServer and mxUserGroup) contain a variety of system
information. With the exception of mxDisseminate they are not displayed in the
client application’s Search Manager and so cannot be searched in the same way as
The information input by users, via the
Crime Workbench client application, is stored in intelligence databases, such as the Report,
Address and Vehicle databases.
Workbench installation may contain multiple
non-configuration servers. Using non-configuration servers allows you to have
more than one database of the same type. For example, if you want to have
several Address databases, you must put each one on a separate server.
Each non-configuration server contains:
Typically, though not necessarily, the same selection of
databases as those on the configuration server.
Only the mxAuditnn and mxDisseminate management databases are
included (depending on a user’s permissions). All of the configuration data is
centrally located on the configuration server.
The archive server is a special type of non-configuration
server, used to store deleted records. If an archive server is installed,
whenever a user deletes a record, it is moved to the matching database on the
archive server. For example, if you delete a record in an Address database it is
moved to the Address database on the archive server. (Typically, access to the
archive server is only given to administrative users.)
Introduction to Memex
Memex Intelligence Engine is a unique technology that has been developed
specifically to meet the needs of organizations involved in law enforcement,
national security and combating commercial fraud. It is neither a relational
database management system nor a simple text indexing system, but a secure
hybrid design with the benefits of both of these technologies.
Memex Intelligence Engine is the backbone for Memex’s
Behind the scenes, the
MIE provides the
majority of the data input, search, retrieval and database management functions
for these applications.
Structured and free-text
Today’s databases are primarily relational systems, based on
the concept of highly structured data. These systems are very prescriptive about
the types of data you can store within them and the methods you can use to
access the data efficiently. As an example, records about a person will allow
only one date of birth, and will not allow a partial or approximate date of
birth. Typically, data can only be searched by some of the fields and little or
no “fuzzy” searching is available.
By contrast, indexed, free-text systems allow you to search
for any information within text documents. However, these systems also have
their limitations. For example, the data must be completely indexed – an
operation that takes some time. This prevents information from being searchable
immediately on entry. Wholly free-text systems also offer no support for
applying structure to data where you do require it, and little, if any, support
for data security.
Memex Intelligence Engine enables you to provide as little, or as much,
structure to your data as you need. It allows you to search every part of your
data without the need to define indexes, and it provides a framework of security
and auditing for all functionality.
Basic concepts of Memex
Instead of each character having a numeric code between 1 and
255, as in the original ASCII text, the
incoming text data into numbers, through a process called tokenisation. Each
unique word, text unit (such as an acronym), symbol or punctuation mark in a
database is treated as a token. Each token is given a numeric code, which is
assigned the first time the token is added to the database. The tokens are
stored as variable-length bytes, with most tokens being assigned either a 1-byte
or 2-byte code. In the default configuration, numbers in the incoming text are
stored as numbers, rather than as tokens.
Additional information is stored about capitalization,
document breaks, field separators and numbers. This information is also
Once data is stored in coded form in a Memex database it can
be searched very rapidly. The
Memex Intelligence Engine
searches for the small byte sequences that represent the required words. This is
a far quicker process than searching the original raw text, and is aided by the
fact that using tokenisation compresses the original text by up to 70 per cent.
To further improve performance, the database is split into
separate files called segments. Data is indexed to segment level, so that the
MIE can quickly
determine which segments contain a given word, or combination of words. This
allows the MIE to
limit its search to those segment files that contain the word, or words, in the
user’s search expression. The segment index (which is also known as the navigate
and search map, or, more simply, as the map file) allows rapid elimination of a
great deal of data, and means that typically only 5–10 per cent of the data
needs to be scanned. Additionally, when a database does not contain the word or
phrase being searched for, the
MIE can determine
this without having to search any of the coded segments.
How the system is
Memex Intelligence Engine and the applications it supports are organized
in a client–server relationship. Search facilities and data access controls are
provided to client applications – such as
and CWB i2 Plug-in – by a set of server-side programs. The
MIE can also be
accessed by custom-built solutions, using one of the programming APIs.
MIE itself consists of a central server program and a suite of programs
and utilities that carry out specific tasks. Data is stored in one or more Memex
databases, which are typically located on a single physical server.
Client applications connect to the
MIE via TCP/IP,
specifying the server’s host name and the network port on which the
MIE listens for
connections. Typically, the
MIE uses port 590,
although another port can be chosen, if required.
The Memex server
MIE runs as a two processes called aisvr and bcp. The tasks performed by
the Memex server fall roughly into two categories:
Controlling all database access, input and maintenance on behalf of client
applications. Where appropriate, the Memex server also provides user
authentication and handles the connection of networked applications.
Performing all search operations on Memex databases. After
processing the search commands it receives from client applications, the Memex
server runs the appropriate search, provides information on the status of the
search, and returns any hits it finds.
The Memex server is usually set up to start automatically
when the computer hosting the
MIE is rebooted.
This is achieved by using a startup script, on a UNIX server.
Other parts of the system
consists of a large collection of programs and files. The following list briefly
describes the main constituent parts of the system – other than the Memex server
and its configuration file – and indicates where you can find more information.
Memex databases comprise a number of files, including:
One or more coded segment files (often referred to as “cod”
database configuration file, containing
definitions of fields, level separators, the security classification for the
MIE’s Registry file is a text file that can be used to identify and
organize the databases that are available to an application.
For each database, the registry contains such information as:
creation date/time of the database;
host name of the computer on which the database
path to the directory in which the database
files are stored;
unique number that applications use to identify
the database, rather than using the path, thereby making it easier to change the
physical location of the database.
Security is applied to data using a system of locks and keys.
The locks, together with the specification of which keys each user possesses,
are defined in a security class, within a security file. Each database is
assigned a security class, from which it takes its security settings.
Applications may use a number of security classes, stored in one or more
security files – for example, assigning a particular security class to a
particular type of database. Alternatively (as is the case with
Crime Workbench ),
an application may use a single security class in a single security file.
The file path and name of the security file is specified as
the Memex server’s configuration parameter.
The security journal is a low-level log of database events.
It is a text file, but identifies logged events using a code, to keep the size
of the file to a minimum. The file path and name of the security journal, and
the events it logs, are specified as the Memex server’s configuration parameter.
MIE logs all errors in a text file. The file path and name of the file
are specified as the Memex server’s configuration parameter. The error log also
records configuration information about the system each time the Memex server is
started. For debugging purposes, the Memex server can be configured to log all
Memex API function calls to the error log.
Query expanders file
This text file allows users to access functions directly from
the query line, rather than via an application control. For example, the
function mx_phonetic can be mapped to the word SOUNDS, allowing a search such as
The file also maps Boolean operators to words. This allows
users to enter a query such as dog NOT spaniel, instead of dog ! spaniel. The
file is also used to specify the file path and name of the default thesaurus
The file path and name of the query expanders file are
specified in the Memex server’s configuration file.
MIE can search for words that have a similar meaning to a specified word.
In order to perform synonym expansion, the system must have access to at least
one thesaurus file. Each database can have a thesaurus assigned to it by
specifying the path to the thesaurus file in the database’s configuration file.
If a database does not have an individually assigned thesaurus, a default
thesaurus is used. The default thesaurus is specified in the query expanders
A link repository is a special kind of database, designed for
storing information about links between records.
Crime Workbench ,
for example, uses three link repositories for storing the details of explicit,
implicit and case links. Link repositories use a proprietary format and the data
they contain cannot be modified in any way, other than via the
A query history is a collection of information concerning
searches run by a user. Each query history entry stores the details an
individual search, including the results returned by that search. Each user can
have one or more query history, created and maintained by a client application.
Typically, query histories are created in a directory used for temporary files –
for example, /tmp.
MIE comes complete with a set of tools for manipulating or repairing
databases, loading/dumping data to/from a link repository, and adding query