PHP Classes

File: README.markdown

Recommend this page to a friend!
  Classes of Jericko Tejido   basset-ir   README.markdown   Download  
File: README.markdown
Role: Documentation
Content type: text/markdown
Description: Documentation
Class: basset-ir
Retrieve, transform and process text documents
Author: By
Last change: Formalized ResultSet and removed trie structure. Added MetaData class for doc infos. Refactored feedback. Removed IndexSearch to make way for IndexManager. Updated ReadMe
Date: 5 years ago
Size: 1,580 bytes
 

Contents

Class file image Download

Droopy

Build Status

Basset

Basset is a full-text PHP Information Retrieval library. This is a collection of developments in the field of IR and ported over to PHP for research purposes.

Basset provides different ways of searching through documents in a collection (ad-hoc retrieval), by applying advanced and experimental IR algorithms and/or techniques gathered from different Research studies and Conferences, most notably:

  1. TREC
  2. SIGIR
  3. ECIR
  4. ACM

Documentation

You can read about it here

Using the Cranfield Collection and the sample.php file

The Cranfield Collection has been the pioneer collection in information retrieval to validate a system's effectiveness.

I've included the 1400 abstract Cranfield Collection as an XML file that you can parse into separate files.

The test file at tests/sample.php can be executed right away to do the parsing and do a search for a single test query. Customize it to your needs if needed.

You can read Cranfield/cranfield-collection/cranqrel for Glassgow's qrels result.

I've also included SMART system's stopword list for standardization (see stopwords/stopwords.txt).