SPINdex: The Perl-Based Site Searching Suite
"Smaller, Faster, Focused on Searching"
- Current Release: 4.3.2
- Current Stable Version: 4.3.2
- Current Dev Version: pre-5.0-3
What is SPINdex?
SPINdex is a Perl script (or a suite of Perl scripts) that can perform a host of functions. SPINdex is the child of anger and frustration that the author had regarding the issue of searching web sites quickly and efficiently, both from a client's perspective, and the server's perspective.
The main function of SPINdex is to allows ones' website to be easily searchable, with minimal overhead and extra programming. It can do a host of other things too.
What is the official spelling of SPINdex?
SPINdex (capital 'SPIN', lowercase 'dex') is the official spelling, although my brain farts from time to time and I just say 'Spindex' -- Either way is fine.
What is the Current Version of SPINdex?
Version 4.1.5 is the latest fully released version, with patches up to version 4.3.2. What is requred to run SPINdex? Hmmm... Good question. Memory use is negligible, it's pretty flexible on processor... Hmmm.... Let's see ... For it to be used as a search engine:
1. A Computer 2. An Operating System that will run on the aforementioned computer 3. A Perl 5 port for that Operating System (recommended 5.005 or better) 4. The CGI.pm module for that Perl 5 port (actually not required, see below) 5. A Web-Server that supports Perl CGI's
If you don't like the CGI.pm Perl module, and would rather use your favorite module, or perhaps build SPINdex into your webserver, you can easily port SPINdex to whatever. The actual interactive parts of the code are very small, and easily modified. The guts of SPINdex are Operating System, platform, hardware, webserver NON-SPECIFIC and don't need to be changed at all.
So what do I recommend to run SPINdex? A 4-processor Pentium II running the latest Linux kernel, and 128mb RAM with the latest version of Apache, and the latest build of Perl 5 and mod_perl.... Wouldn't that be great?? Seriously, Any old 386sx running Linux with the Apache Webserver and Perl 5.004 would be fine. If you want it on a Windows machine then I recommend at least a low-ball Pentium with Win9x and Apache for Windows and Perl 5.003.
SPINdex doesn't care what OS you've got, or really even how much RAM you have. It is processor-intensive, but it will run fine on anything you give it-- Faster processor simply means faster searches.
Does SPINdex support mod_perl?
Can fish swim?! Yes, SPINdex 3.5 or better are definitely mod_perl compatible, and exhibit SIGNIFICANT performance increases. Between 200-500% depending on resources.
NOTE: There exists a minor bug in all versions before 4.1.5 that prevents searching "by date" from being 100% accurate 100% of the time when running on mod_perl sites with long uptimes. 4.1.5 fixes this bug. Really. I swear.
Where can I get SPINdex?
SPINdex 3 and 4 are now only available via download from our SourceForge development page.
SPINdex is released under the GPL (GNU Public License).
How does it work?
Well, very simply, it traverses the directories specified as "search directories", and recursively runs through each of the subdirectories (unless it is flagged as an excluded directory) and checks each file with a suitable extension (again, unless the file is flagged as excluded) for the string specified with a case-insensitive regular expression. If the expression matches, the name (specified by the text between the <title> </title> tags) and URL are recorded.
How do I customize SPINdex for my site?
You need to edit 'config.pm'. You should point weblinks to the 'spindex.pl' application or configure your web server to allow 'index.pl' or 'index.cgi' as a "default page", and rename 'spindex.pl' (that's what I do for http://mattwork.potsdam.edu/search/ )
Add directory trees to be searched
Add FOLDERS with matching URLs to the %SearchDirs hash, in a format as shown below, being careful that:
- All local paths and urls end with '/'
- All local paths are in webspace!
%SearchDirs = (
#:"localfolder/"=>"http://urltomapto/",
"/usr/local/apache/htdocs/"=>"http://localhost/",
);
Allow the "checkable" options of searching multiple "sites" or "sections"
Syntax is "local_path"=>[ "Name to print",1|0 ], The ending "1|0" is either a "1" or a "0". This sets whether (1) or not (0) the box is checked by default.
%OptionSearch=(
"/usr/local/apache/htdocs/"=>[ "My Site",1 ],
);
Exclude files or folders
Add local directories, filenames, or path fragmeents to exclude to the @exclusions array as noted below:
NOTE: If you specify a directory, it will not check any subdirectories within that directory either.
NOTE: The ability to specify files was finally added for version 4.3.2
@exclusions = (
"/usr/local/apache/htdocs/manual/", # all folders/files in this folder will be ignored
"/usr/local/apache/htdocs/spindex.htm", # this specific file will be ignored
"template.htm", # all files with this name will be ignored, regardless of location.
);
Specify search item conditions
Set to "and" or "or" - "and" requires filenames to meet once criteria from BOTH the "extensions" AND the "searchpatterns. "or" allows the filename to meet one critera from EITHER of the "extensions" AND the "searchpatterns. Special thanks to Eric Thern for his help on brainstorming this!
$searchPriority="or";
Specify extensions to be searched
Add extensions to be searched to the @extensions array as noted below, simply use '.' to use all extensions:
@extensions=(
".html",
".htm",
".php",
);
Specify extensions where only the filename/path should be searched
Added in 4.3.1
Add extensions to be searched to the @titleExtensions array as noted below, simply use '.' to use all extensions:
@titleExtensions=(
".jpg",
".gif",
".wav",
".png",
);
Specify filename search patterns to be searched
Add filename patterns to be searched to the @searchpatters array. Special thanks to Eric Thern for his help on brainstorming this!
# This will search filenames/folder starting with (^) the word README @searchpatterns=( "^README", );
Specify whether or not you want searches to be logged
Set this to '1' if you want to log every search request (does not function if $indexMode is set)
$logSearches=1;
Specify the location of the log file
If $logSearches is set, this is the file that entries get logged to.
$logFile="/usr/local/apache/logs/spindex.log";
Enable full search at startup
If this is set to 1, users accessing this via the web (non-index) will be presented with a list of all the searchable pages (same as entering '.' into the browser), when they go to the page. Obviously setting this to '0' has much less of a performance hit for first time access.
$allDefault=0;
Specify whether to want the output in HTML or not
If this is set to 1, the title of webpages will be the name of the link, otherwise just a list of URLs will be returned.
$HTMLMode=1;
Specify whether or not to send unbuffered output
If this is set to 1, then results will be sent from SPINdex to the client as they are found, instead of in pieces.
$UnBufferedOutput=0;
Specify the page cache expiration
If this is set to "-1d", then results will never be cached. Set to (h)our, (m)onth, (d)ay.
$expire="-1d";
Specify the template file
This is the absolute path to the template file.
$templateFile="./template.html";
Specify the template file for AvantGo lookups
This is the absolute path to the template file.
$AvantFile="./avanttemp.html";

