A Data Mining Application

Description of the Application

This application relates to the needs of every store. Customers purchase various articles. The information on similar customer profiles can be very useful for the store. We show how EnFuzion can significantly speed up the calculation of nearest customer profiles. Although the numbers are hypothetical, this demonstration can be easily enhanced to real world problems.

A database of transactions contains information on all purchases. A transaction contains the following information:


We are looking for customers with similar purchasing habits. The program browses through the database and compares the transaction record of the target customer to the transactions of all the other customers. For each comparison, a degree of similarity is calculated. Nearest customers are those for whose the sum of all the similarities is the highest.

The result of this application is a list of nearest customers for each customer.


  • install EnFuzion and prepare the enfuzion.nodes configuration file for your network of computers as specified in the EnFuzion manual. Your network can contain an arbitrary mix of Linux/Unix and WindowsNT computers.

  • If the files in this directory are on CD-ROM, copy the entire directory and all its subdirectories to a hard disk. This step is necessary, since results and temporary files will be generated in the directory.

  • Start the enfdispatcher with its working directory set to the datamining directory. For example if the datamining directory is on path /home/bob/datamining, use the following commands:

    cd /home/bob/datamining
    enfdispatcher mining.run

The data mining application will be automatically distributed and the nearest neighbors will be calculated on all EnFuzion node computers. If your network is heterogeneous, EnFuzion will choose and install correct binaries for this application on each node platform.

At the end of the execution, the nearest customers will be placed in file "similar_customers".

This application demonstrates the ability of EnFuzion to:

  • Speed up the computation by distributing work over multiple computers
  • Use different hardware platforms concurrently
  • Perform data mining applications in parallel on multiple computers

