A Data Mining Application
Description of the Application
This application relates to the needs of every store. Customers purchase
various articles. The information on similar customer profiles can be very useful for the store. We show how EnFuzion can significantly speed up the
calculation of nearest customer profiles. Although the numbers are
hypothetical, this demonstration can be easily enhanced to real world problems.
A database of
transactions contains information on all purchases. A transaction
contains the following information:
customer_id
article
price
quantity
We are looking
for customers with similar purchasing habits. The program browses
through the database and compares the transaction record of the
target customer to the transactions of all the other customers.
For each comparison, a degree of similarity is calculated.
Nearest customers are those for whose the sum of all the
similarities is the highest.
The result of this application is a list of nearest customers for each customer.
Instructions:
install EnFuzion and prepare the enfuzion.nodes configuration file
for your network of computers as specified in the EnFuzion manual.
Your network can contain an arbitrary mix of Linux/Unix and WindowsNT
computers.
If the files in this directory are on CD-ROM, copy the entire
directory and all its subdirectories to a hard disk. This step
is necessary, since results and temporary files will be generated
in the directory.
Start the enfdispatcher with its working directory set to
the datamining directory. For example if the datamining
directory is on path /home/bob/datamining, use the following
commands:
cd /home/bob/datamining
enfdispatcher mining.run
The data mining application will be automatically distributed and
the nearest neighbors will be calculated on all EnFuzion node
computers. If your network is heterogeneous, EnFuzion will choose
and install correct binaries for this application on each node
platform.
At the end of the execution, the nearest customers will be placed
in file "similar_customers".
This application demonstrates the ability of
EnFuzion to:
- Speed
up the computation by distributing work over multiple computers
- Use
different hardware platforms concurrently
- Perform
data mining applications in parallel on multiple computers
 |