contenido\classes\class.search.php
Description: API to index a contenido article API to search in the index structure API to display the searchresults
Requirements:
- Author
- Willi Man
- Con_php_req
- 5.0
- Copyright
- four for business AG
- License
- http://www.contenido.org/license/LIZENZ.txt
- Link
- http://www.4fb.de
- Link
- http://www.contenido.org
- Package
- Contenido Backend classes
- Since
- file available since contenido release <= 4.6 {@internal created 2004-01-15 modified 2008-06-30, Frederic Schneider, add security fix modified 2008-07-11, Dominik Ziegler, marked class search_helper as deprecated modified 2008-11-12, Andreas Lindner, add special treatment for iso-8859-2 $Id: class.search.php 873 2008-11-12 09:18:50Z andreas.lindner $: }}
- Version
- 1.0.1
\Index
Properties


array $cms_type = array()
htmlhead - HTML Headline html - HTML Text head - Headline (no HTML) text - Text (no HTML) img - Upload id of the element imgdescr - Image description link - Link (URL) linktarget - Linktarget (_self, _blank, _top ...) linkdescr - Linkdescription swf - Upload id of the element etc.
array()
Details- Type
- array


array $cms_type_suffix = array()
array()
Details- Type
- array


array $keycode = array()
array()
Details- Type
- array


array $keywords_old = array()
array()
Details- Type
- array


string $place
The value is a string like "&12=2(CMS_HTMLHEAD-1,CMS_HTML-1)", which means a keyword occurs 2 times in article with $idart 12 and can be found in CMS_HTMLHEAD[1] and CMS_HTML[1]. The field 'self' can be used in the article properties to index the article manually.
- Type
- string
Methods


Index( $oDB = false) : void
Constructor set object properties
Name | Type | Description |
---|---|---|
$oDB | Contenido Database object |


addSpecialUmlauts( $key) : \$key
Name | Type | Description |
---|---|---|
$key | Keyword |
Type | Description |
---|---|
\$key |
- Modified
- 2008-04-17, Timo Trautmann - reverse function to removeSpecialChars (important for syntaxhighlighting searchterm in searchresults) adds umlauts to search term


checkCmsType( $idtype) : \bolean
check if the current cms type is in the cms_options array
Name | Type | Description |
---|---|---|
$idtype |
Type | Description |
---|---|
\bolean |


createKeywords() : void
for each cms-type create index structure.
it looks like Array ( [die] => CMS_HTML-1 [inhalte] => CMS_HTML-1 [auf] => CMS_HTML-1 CMS_HTMLHEAD-2 [dieser] => CMS_HTML-1 [website] => CMS_HTML-1 CMS_HTML-1 CMS_HTMLHEAD-2 )


deleteKeywords() : void
if keywords don't occur in the article anymore, update index_string and delete keyword if necessary


removeSpecialChars( $key) : \$key
remove special characters from index term
Name | Type | Description |
---|---|---|
$key | Keyword |
Type | Description |
---|---|
\$key |


saveKeywords() : void
generate index_string from index structure and save keywords The index_string looks like "&12=2(CMS_HTMLHEAD-1,CMS_HTML-1)"


setCmsOptions(\none $cms_options) : void
set the cms_options array of cms types which should be treated special
Name | Type | Description |
---|---|---|
$cms_options | \none |


setStopwords(array $aStopwords) : void
set the array of stopwords which should not be indexed
Name | Type | Description |
---|---|---|
$aStopwords | array |


start(int $idart, array $aContent, string $place = 'auto', array $cms_options = array(), \arary $aStopwords = array()) : void
start indexing the article
Name | Type | Description |
---|---|---|
$idart | int | Article Id |
$aContent | array | The complete content of an article specified by its content types. It looks like Array ( [CMS_HTMLHEAD] => Array ( [1] => Herzlich Willkommen... [2] => ...auf Ihrer Website! ) [CMS_HTML] => Array ( [1] => Die Inhalte auf dieser Website ... |
$place | string | The field where to store the index information in db. |
$cms_options | array | One can specify explicitly cms types which should not be indexed. |
$aStopwords | \arary | Array with words which should not be indexed. |
\Search
This object starts a indexed fulltext search
TODO: The way to set the search options could be done much more better! The computation of the set of searchable articles should not be treated in this class. It is better to compute the array of searchable articles from the outside and to pass the array of searchable articles as parameter. Avoid foreach loops.
Use object with
$options = array('db' => 'regexp', // use db function regexp 'combine' => 'or'); // combine searchwords with or
The range of searchable articles is by default the complete content which is online and not protected.
With option 'searchable_articles' you can define your own set of searchable articles. If parameter 'searchable_articles' is set the options 'cat_tree', 'categories', 'articles', 'exclude', 'artspecs', 'protected', 'dontshowofflinearticles' don't have any effect.
$options = array('db' => 'regexp', // use db function regexp 'combine' => 'or', // combine searchwords with or 'searchable_articles' => array(5, 6, 9, 13));
One can define the range of searchable articles by setting the parameter 'exclude' to false which means the range of categories defined by parameter 'cat_tree' or 'categories' and the range of articles defined by parameter 'articles' is included.
$options = array('db' => 'regexp', // use db function regexp 'combine' => 'or', // combine searchwords with or 'exclude' => false, // => searchrange specified in 'cat_tree', 'categories' and 'articles' is included 'cat_tree' => array(12), // tree with root 12 included 'categories' => array(100,111), // categories 100, 111 included 'articles' => array(33), // article 33 included 'artspecs' => array(2, 3), // array of article specifications => search only articles with these artspecs 'res_per_page' => 2, // results per page 'protected' => true); // => do not search articles or articles in categories which are offline or protected 'dontshowofflinearticles' => false); // => search offline articles or articles in categories which are offline
You can build the complement of the range of searchable articles by setting the parameter 'exclude' to true which means the range of categories defined by parameter 'cat_tree' or 'categories' and the range of articles defined by parameter 'articles' is excluded from search.
$options = array('db' => 'regexp', // use db function regexp 'combine' => 'or', // combine searchwords with or 'exclude' => true, // => searchrange specified in 'cat_tree', 'categories' and 'articles' is excluded 'cat_tree' => array(12), // tree with root 12 excluded 'categories' => array(100,111), // categories 100, 111 excluded 'articles' => array(33), // article 33 excluded 'artspecs' => array(2, 3), // array of article specifications => search only articles with these artspecs 'res_per_page' => 2, // results per page 'protected' => true); // => do not search articles or articles in categories which are offline or protected 'dontshowofflinearticles' => false); // => search offline articles or articles in categories which are offline
$search = new Search($options);
$cms_options = array("htmlhead", "html", "head", "text", "imgdescr", "link", "linkdescr"); search only in these cms-types $search->setCmsOptions($cms_options);
$search_result = $search->searchIndex($searchword, $searchwordex); // start search
The search result structure has following form Array ( [20] => Array ( [CMS_HTML] => Array ( [0] => 1 [1] => 1 [2] => 1 )
[keyword] => Array
(
[0] => content
[1] => contenido
[2] => wwwcontenidoorg
)
[search] => Array
(
[0] => con
[1] => con
[2] => con
)
[occurence] => Array
(
[0] => 1
[1] => 5
[2] => 1
)
[similarity] => 60
)
)
The keys of the array are the article ID's found by search.
Searching 'con' matches keywords 'content', 'contenido' and 'wwwcontenidoorg' in article with ID 20 in content type CMS_HTML[1]. The search term occurs 7 times. The maximum similarity between searchterm and matching keyword is 60%.
with $oSearchResults = new SearchResult($search_result, 10); one can rank and display the results
- Author
- Willi Man
- Copyright
- four for business AG
- Version
- 1.0.1
Properties


boolean $dontshowofflinearticles
- Type
- boolean


boolean $exclude
- Type
- boolean


boolean $protected
- Type
- boolean


array $search_result = array()
..
array()
Details- Type
- array


array $search_words_exclude = array()
array()
Details- Type
- array
Methods


Search(array $options, $oDB = false) : void
Constructor
Name | Type | Description |
---|---|---|
$options | array | $options['db'] 'regexp' => DB search with REGEXP; 'like' => DB search with LIKE; 'exact' => exact match; $options['combine'] 'and', 'or' Combination of search words with AND, OR $options['exclude'] 'true' => searchrange specified in 'cat_tree', 'categories' and 'articles' is excluded; 'false' => searchrange specified in 'cat_tree', 'categories' and 'articles' is included $options['cat_tree'] e.g. array(8) => The complete tree with root 8 is in/excluded from search $options['categories'] e.g. array(10, 12) => Categories 10, 12 in/excluded $options['articles'] e.g. array(23) => Article 33 in/excluded $options['artspecs'] => e.g. array(2, 3) => search only articles with certain article specifications $options['protected'] 'true' => do not search articles which are offline (locked) or articles in catgeories which are offline (protected) $options['dontshowofflinearticles'] 'false' => search offline articles or articles in categories which are offline $options['searchable_articles'] array of article ID's which should be searchable |
$oDB |


addArticleSpecificationsByName( $sArtSpecName) : void
Add all article specifications matching name of article specification (client dependent but language independent)
Name | Type | Description |
---|---|---|
$sArtSpecName |


getArticleSpecifications() : Array
Fetch all article specifications which are online
Type | Description |
---|---|
Array | of article specification Ids |


getSearchableArticles( $search_range) : \Articles
Name | Type | Description |
---|---|---|
$search_range |
Type | Description |
---|---|
\Articles | in specified search range |


getSubTree( $cat_start) : \Category
Name | Type | Description |
---|---|---|
$cat_start | Root of a category tree |
Type | Description |
---|---|
\Category | Tree |


searchIndex(string $searchwords, string $searchwords_exclude = '') : void
indexed fulltext search
Name | Type | Description |
---|---|---|
$searchwords | string | The search words |
$searchwords_exclude | string | The words, which should be excluded from search |


setArticleSpecification( $iArtspecID) : void
Set article specification
Name | Type | Description |
---|---|---|
$iArtspecID |


setCmsOptions( $cms_options) : void
Name | Type | Description |
---|---|---|
$cms_options | The cms-types (htmlhead, html, ...) which should explicitly be searched |
\SearchResult
This object ranks and displays the result of the indexed fulltext search. If you are not comfortable with this API feel free to use your own methods to display the search results. The search result is basically an array with article ID's.
If $search_result = $search->searchIndex($searchword, $searchwordex);
use object with
$oSearchResults = new SearchResult($search_result, 10);
$oSearchResults->setReplacement('', ''); // html-tags to emphasize the located searchwords
$num_res = $oSearchResults->getNumberOfResults(); $num_pages = $oSearchResults->getNumberOfPages(); $res_page = $oSearchResults->getSearchResultPage(1); // first result page foreach ($res_page as $key => $val) { $headline = $oSearchResults->getSearchContent($key, 'HTMLHEAD'); $first_headline = $headline[0]; $text = $oSearchResults->getSearchContent($key, 'HTML'); $first_text = $text[0]; $similarity = $oSearchResults->getSimilarity($key); $iOccurrence = $oSearchResults->getOccurrence($key);
}
- Author
- Willi Man
- Copyright
- four for business AG
- Version
- 1.0.0
Properties


array $ordered_search_result = array()
array()
Details- Type
- array


array $rank_structure = array()
array()
Details- Type
- array


array $replacement = array()
array()
Details- Type
- array
Methods


SearchResult( $search_result, $result_per_page, $oDB = false, $bDebug = false) : void
Compute ranking factor for each search result and order the search results by ranking factor NOTE: The ranking factor is the sum of occurences of matching searchterms weighted by similarity (in %) between searchword and matching word in the article.
TODO: One can think of more sophisticated ranking strategies. One could use the content type information for example because a matching word in the headline (CMS_HEADLINE[1]) could be weighted more than a matching word in the text (CMS_HTML[1]).
Name | Type | Description |
---|---|---|
$search_result | ||
$result_per_page | ||
$oDB | ||
$bDebug |


getContent( $art_id, $cms_type, $id = 0) : \Content
Name | Type | Description |
---|---|---|
$art_id | Id of an article | |
$cms_type | ||
$id |
Type | Description |
---|---|
\Content | of an article, specified by it's content type |


getOccurrence( $art_id) : \Number
Name | Type | Description |
---|---|---|
$art_id | Id of an article |
Type | Description |
---|---|
\Number | of matching searchwords found in article |


getSearchContent( $art_id, $cms_type, $cms_nr = NULL) : \Content
Name | Type | Description |
---|---|---|
$art_id | Id of an article | |
$cms_type | Content type | |
$cms_nr |
Type | Description |
---|---|
\Content | of an article in search result, specified by its type |


getSearchResultPage( $page_id) : \Artices
Name | Type | Description |
---|---|---|
$page_id |
Type | Description |
---|---|
\Artices | in page $page_id |


getSimilarity( $art_id) : \Similarity
Name | Type | Description |
---|---|---|
$art_id | Id of an article |
Type | Description |
---|---|
\Similarity | between searchword and matching word in article |


setOrderedSearchResult( $ranked_search, $result_per_page) : void
Name | Type | Description |
---|---|---|
$ranked_search | ||
$result_per_page |