Linux Index |
Make a documents databaseIntroductionMany people classify their files in a directory structure. Suppose you have many documents and you want to be able to find them back easily and quickly. You need a search engine able to find a document based on its keywords, title, author name, kind of document, ... The purpose of this section is to explain, step for step, a very appropriate solution to find any document stored somewhere in a directory structure very fast. Give your search criterion in a web form and you get a list of matching documents. Click on one of the documents in the list and it's open ! This solution is multi-platform, free and flexible. It's just an assembly of renowned software with small scripts around them to clue the whole thing. The basic ingredients of this receipt are :
If many users need to access your database, you have to install the software only once on a central computer. Each user only need a web browser to access the database from anywhere on the network. To prevent any abusive use, the user need to give a password before he can access the documents. The story in an exampleSuppose you have a directory 'Documents' containing the following files : For each file you want to see in your database,
you will add a description file with the extension '.nfo' as follow : The description files contain informations about the corresponding document. Only the first four letters of the field name are significant. It may be :
All fields are optional but you should at least fill the field 'titl'. In the previous example, we could have :
Project2.nfo
hf_meas.nfo
letter.nfo
In place of 'author', you may just write 'auth'. Don't forget to create the '.nfo' file for directories (even an empty file) otherwise sub-directories won't be scanned. A Tcl script will scan the directories and try to find any '.nfo' file and the corresponding document. The script will fill the database (here an SQL database). In the table below, you see what the database entries will look like (only a few columns are shown) :
This table also contains the name of the author, a reference and the location of the file. Actually, it is slightly more complicated. We suppose we have a limited number of types and projects. So, the real database contains 3 tables :
But those are details you don't need to care about ... Fields values may inherit from a parent directory :
We can connect to the search engine with a simple web browser. If we configure Apache so that our files are 'a restricted stuff', the user is first prompted for his/her login and password. Then we get a search form such as the following :
As you can see in the form above, the possible values for 'project' and 'type' are automatically filled from the scanned files. After you've typed your search criterion, you get a list of matching documents :
The title of the document is also a hyper-link to the document. If your browser has the appropriate plug-ins, one click one the title is enough to open the document. Apache/PHP setupDownload Apache at http://httpd.apache.org. Install it. Make a directory where you will put your documents. For example, make 'C:/html'. Download PHP at http://www.php.net/. Install it. In the configuration file of Apache, httpd.conf, change the setting 'DocumentRoot'
to point to the directory containing your html files.
For example, You also need to specify the directories to be served and the
corresponding 'alias' names. For example if you want to be able to
access to the directories 'C:/documents_project1' and 'C:/personnal_docs',
add the following lines in the file httpd.conf: In the same file, check the section included between '<Directory />' and '</Directory>'. It should look like this :
In the example above, we specify that any user has to identify himself before he can access to the documents. The file containing the user login/password is (in this example) named 'c:/pass.txt'. You also need to say Apache where to find the PHP interpretor. For example, if the interpretor is 'C:/PHP/PHP.EXE', then you need to check that the following lines are present in the file http.conf :
Define a new user. Go into the bin directory of Apache in a console. (for Windows users, this directory should be 'C:/Program files/Apache Group/Apache/bin'). Type where you should replace 'username' by the name of the user you wish to add.
To add a second user, type
where you should replace 'username2' by the name of the user you wish to add.
Start the Apache server. Tcl + SQL libraryDownload Tcl/Tk 8.3 from http://dev.scriptics.com/software/tcltk/download83.html. Install it. Now you need a library to allow tcl to access Mysql. Download fbsql at http://www.fastbase.co.nz/fbsql/index.html. Windows users: install the dll file in the bin directory of tcl. Unix users: follow the instructions ... Scripts installationDownload the following zip file scripts.zip and install its contents in the root directory of the Apache server. In our example, install the files in the directory 'C:/html'. You will find the following files :
Mysql setupIf needed, download Mysql at http://www.mysql.org. Install it. Start the server :
Now we need to prepare Mysql for our documents database. Execute the script 'initdb.tcl'.
Restart the Mysql server. Changing the directories to be scannedThe script 'makeindex.tcl' mentioned above has to know where are the directories to be scanned. If you only want to scan the directory 'documents' (or more precisely, 'C:/html/documents'), you don't need to change anything (because it is the default setting). Suppose you want to scan the directories named 'C:/documents_project1' and
'C:/personnal_docs' (see Apache setup). Start Mysql in the console :
When prompted for password enter 'db_pass'. Then,
To see the list of the scanned directories, type
You will only see the directories 'documents'. To suppress this first element of this list, type
Now, you can insert the new entries.
As you can see, for each directory you add, you also need to add its alias name (the same than specified in the setup of the Apache server). This needed to take care that the search results are linked to the right web address. Try it !At this stage, the database should be fully operational. First of all, you should create the directory where you want to place your documents. For example, create 'C:/documents_project1'. Place a few documents in this directory and make the corresponding '.nfo' files. You may also create sub-directories but don't forget to create a '.nfo' file for each sub-directory (even an empty '.nfo' file is OK). Start the script 'makeindex.tcl'. It will run in background and update silently the database every two hours. Start your favorite web-browser. As address, type 'http://127.0.0.1' if you are working without network or enter the address of your computer (or the one where the server is running) if you are working on network. Now you should see the prompt form for login and password. Enter the user name and password you have defined as described above. Click on the link 'search document'. You should see the form 'Search document'. Click on 'Submit' and you will see a list of all the indexed documents. Remark: the use of this database is not limited to documents. You really can use it for anything. For example, if you want to make a database of your friends, you can for example make a html file for each of them where you enter any information you want. You can even place a picture. Or you can just make a scan of their name-card. Make the corresponding '.nfo' file. Auto-logon and auto-startup when using Windows computerThis section is only applicable for Windows users. Whereas it should be easy to do the same job on an Unix computer, I've not yet tried this. First of all, if you have NT computer, you can configure the Apache Web server and the MySQL database as 'services' so they are automatically started at startup. Secondly, to be able to access the network drives, you need to logon. To be sure the same login is used every time, the simplest solution is to use auto-logon. Click on the 'Start' button, 'Run...', then type 'regedit.exe'. Select the path 'HKEY_LOCAL_MACHINE/SOFTWARE/Microsoft/Windows NT/CurrentVersion/Winlogon'. Define the following entries as string :
Thirdly, to start the script 'makeindex.tcl' automatically after each login, go to 'HKEY_LOCAL_MACHINE/SOFTWARE/Microsoft/ Windows/CurrentVersion/Run' in the registry. Define a string entry, for example "shutdownscript" and give it as value the location of the makeindex script, for example "c:\html\makeindex.tcl". If you also want to shutdown automatically at a fixed time, refer to the chapter
'auto-shutdown'.
|