contenido\classes\class.search.php

Show: PublicProtectedPrivateinherited
Table of Contents
Project: Contenido Content Management System

Description: API to index a contenido article API to search in the index structure API to display the searchresults

Requirements:

Author
Willi Man  
Con_php_req
5.0  
Copyright
four for business AG  
License
http://www.contenido.org/license/LIZENZ.txt  
Link
http://www.4fb.de  
Link
http://www.contenido.org  
Package
Contenido Backend classes  
Since
file available since contenido release <= 4.6 {@internal created 2004-01-15 modified 2008-06-30, Frederic Schneider, add security fix modified 2008-07-11, Dominik Ziegler, marked class search_helper as deprecated modified 2008-11-12, Andreas Lindner, add special treatment for iso-8859-2 $Id: class.search.php 873 2008-11-12 09:18:50Z andreas.lindner $: }}  
Version
1.0.1  

\Index

Package: Default

Properties

>VPropertypublicboolean $bDebug
Details
Type
boolean
>VPropertypublicarray $cfg
configuration data
Details
Type
array
>VPropertypublicarray $cms_options = array()
array of cms types
Default valuearray()Details
Type
array
>VPropertypublicarray $cms_type = array()
array of all available cms types

htmlhead - HTML Headline html - HTML Text head - Headline (no HTML) text - Text (no HTML) img - Upload id of the element imgdescr - Image description link - Link (URL) linktarget - Linktarget (_self, _blank, _top ...) linkdescr - Linkdescription swf - Upload id of the element etc.

Default valuearray()Details
Type
array
>VPropertypublicarray $cms_type_suffix = array()
the suffix of all available cms types
Default valuearray()Details
Type
array
>VPropertypublicobject $db
Contenido database object
Details
Type
object
>VPropertypublicint $idart
article id
Details
Type
int
>VPropertypublicarray $keycode = array()
the content of the cms-types of an article
Default valuearray()Details
Type
array
>VPropertypublicarray $keywords = array()
the list of keywords of an article
Default valuearray()Details
Type
array
>VPropertypublicarray $keywords_del = array()
the keywords to be deleted
Default valuearray()Details
Type
array
>VPropertypublicarray $keywords_old = array()
the keywords of an article stored in the DB
Default valuearray()Details
Type
array
>VPropertypublicint $lang
language of a client
Details
Type
int
>VPropertypublicstring $place
'auto' or 'self' The field 'auto' in table con_keywords is used for automatic indexing.

The value is a string like "&12=2(CMS_HTMLHEAD-1,CMS_HTML-1)", which means a keyword occurs 2 times in article with $idart 12 and can be found in CMS_HTMLHEAD[1] and CMS_HTML[1]. The field 'self' can be used in the article properties to index the article manually.

Details
Type
string
>VPropertypublicarray $stopwords = array()
the words, which should not be indexed
Default valuearray()Details
Type
array

Methods

methodpublicIndex( $oDB = false) : void

Constructor set object properties

Parameters
NameTypeDescription
$oDB

Contenido Database object

methodpublicaddSpecialUmlauts( $key) : \$key

Parameters
NameTypeDescription
$key

Keyword

Returns
TypeDescription
\$key
Details
Modified
2008-04-17, Timo Trautmann - reverse function to removeSpecialChars (important for syntaxhighlighting searchterm in searchresults) adds umlauts to search term  
methodpubliccheckCmsType( $idtype) : \bolean

check if the current cms type is in the cms_options array

Parameters
NameTypeDescription
$idtype
Returns
TypeDescription
\bolean
methodpubliccreateKeywords() : void

for each cms-type create index structure.

it looks like Array ( [die] => CMS_HTML-1 [inhalte] => CMS_HTML-1 [auf] => CMS_HTML-1 CMS_HTMLHEAD-2 [dieser] => CMS_HTML-1 [website] => CMS_HTML-1 CMS_HTML-1 CMS_HTMLHEAD-2 )

methodpublicdeleteKeywords() : void

if keywords don't occur in the article anymore, update index_string and delete keyword if necessary

methodpublicgetKeywords() : void

get the keywords of an article

methodpublicremoveSpecialChars( $key) : \$key

remove special characters from index term

Parameters
NameTypeDescription
$key

Keyword

Returns
TypeDescription
\$key
methodpublicsaveKeywords() : void

generate index_string from index structure and save keywords The index_string looks like "&12=2(CMS_HTMLHEAD-1,CMS_HTML-1)"

methodpublicsetCmsOptions(\none $cms_options) : void

set the cms_options array of cms types which should be treated special

Parameters
NameTypeDescription
$cms_options\none
methodpublicsetContentTypes() : void

set the cms types

methodpublicsetStopwords(array $aStopwords) : void

set the array of stopwords which should not be indexed

Parameters
NameTypeDescription
$aStopwordsarray
methodpublicstart(int $idart, array $aContent, string $place = 'auto', array $cms_options = array(), \arary $aStopwords = array()) : void

start indexing the article

Parameters
NameTypeDescription
$idartint

Article Id

$aContentarray

The complete content of an article specified by its content types. It looks like Array ( [CMS_HTMLHEAD] => Array ( [1] => Herzlich Willkommen... [2] => ...auf Ihrer Website! ) [CMS_HTML] => Array ( [1] => Die Inhalte auf dieser Website ...

$placestring

The field where to store the index information in db.

$cms_optionsarray

One can specify explicitly cms types which should not be indexed.

$aStopwords\arary

Array with words which should not be indexed.

\Search

Package: Default
Contenido API - Search Object

This object starts a indexed fulltext search

TODO: The way to set the search options could be done much more better! The computation of the set of searchable articles should not be treated in this class. It is better to compute the array of searchable articles from the outside and to pass the array of searchable articles as parameter. Avoid foreach loops.

Use object with

$options = array('db' => 'regexp', // use db function regexp 'combine' => 'or'); // combine searchwords with or

The range of searchable articles is by default the complete content which is online and not protected.

With option 'searchable_articles' you can define your own set of searchable articles. If parameter 'searchable_articles' is set the options 'cat_tree', 'categories', 'articles', 'exclude', 'artspecs', 'protected', 'dontshowofflinearticles' don't have any effect.

$options = array('db' => 'regexp', // use db function regexp 'combine' => 'or', // combine searchwords with or 'searchable_articles' => array(5, 6, 9, 13));

One can define the range of searchable articles by setting the parameter 'exclude' to false which means the range of categories defined by parameter 'cat_tree' or 'categories' and the range of articles defined by parameter 'articles' is included.

$options = array('db' => 'regexp', // use db function regexp 'combine' => 'or', // combine searchwords with or 'exclude' => false, // => searchrange specified in 'cat_tree', 'categories' and 'articles' is included 'cat_tree' => array(12), // tree with root 12 included 'categories' => array(100,111), // categories 100, 111 included 'articles' => array(33), // article 33 included 'artspecs' => array(2, 3), // array of article specifications => search only articles with these artspecs 'res_per_page' => 2, // results per page 'protected' => true); // => do not search articles or articles in categories which are offline or protected 'dontshowofflinearticles' => false); // => search offline articles or articles in categories which are offline

You can build the complement of the range of searchable articles by setting the parameter 'exclude' to true which means the range of categories defined by parameter 'cat_tree' or 'categories' and the range of articles defined by parameter 'articles' is excluded from search.

$options = array('db' => 'regexp', // use db function regexp 'combine' => 'or', // combine searchwords with or 'exclude' => true, // => searchrange specified in 'cat_tree', 'categories' and 'articles' is excluded 'cat_tree' => array(12), // tree with root 12 excluded 'categories' => array(100,111), // categories 100, 111 excluded 'articles' => array(33), // article 33 excluded 'artspecs' => array(2, 3), // array of article specifications => search only articles with these artspecs 'res_per_page' => 2, // results per page 'protected' => true); // => do not search articles or articles in categories which are offline or protected 'dontshowofflinearticles' => false); // => search offline articles or articles in categories which are offline

$search = new Search($options);

$cms_options = array("htmlhead", "html", "head", "text", "imgdescr", "link", "linkdescr"); search only in these cms-types $search->setCmsOptions($cms_options);

$search_result = $search->searchIndex($searchword, $searchwordex); // start search

The search result structure has following form Array ( [20] => Array ( [CMS_HTML] => Array ( [0] => 1 [1] => 1 [2] => 1 )

    [keyword] => Array
        (
            [0] => content
            [1] => contenido
            [2] => wwwcontenidoorg
        )

    [search] => Array
        (
            [0] => con
            [1] => con
            [2] => con
        )

    [occurence] => Array
        (
            [0] => 1
            [1] => 5
            [2] => 1
        )

    [similarity] => 60
)

)

The keys of the array are the article ID's found by search.

Searching 'con' matches keywords 'content', 'contenido' and 'wwwcontenidoorg' in article with ID 20 in content type CMS_HTML[1]. The search term occurs 7 times. The maximum similarity between searchterm and matching keyword is 60%.

with $oSearchResults = new SearchResult($search_result, 10); one can rank and display the results

Author
Willi Man  
Copyright
four for business AG  
Version
1.0.1  

Properties

>VPropertypublicarray $article_specs = array()
article specifications
Default valuearray()Details
Type
array
>VPropertypublicboolean $bDebug
Debug option
Details
Type
boolean
>VPropertypublicarray $cfg
configuration data
Details
Type
array
>VPropertypublicint $client
a contenido client
Details
Type
int
>VPropertypublicarray $cms_type = array()
array of available cms types
Default valuearray()Details
Type
array
>VPropertypublicarray $cms_type_suffix = array()
suffix of available cms types
Default valuearray()Details
Type
array
>VPropertypublicobject $db
Contenido database object
Details
Type
object
>VPropertypublicboolean $dontshowofflinearticles
If $dontshowofflinearticles = false => search offline articles or articles in categories which are offline
Details
Type
boolean
>VPropertypublicboolean $exclude
If $exclude = true => the specified search range is excluded from search, otherwise included
Details
Type
boolean
>VPropertypublicobject $index
Instance of class Index
Details
Type
object
>VPropertypublicint $lang
language of a client
Details
Type
int
>VPropertypublicboolean $protected
If $protected = true => do not search articles which are offline or articles in catgeories which are offline (protected)
Details
Type
boolean
>VPropertypublicstring $search_combination
logical combination of searchwords (and, or)
Details
Type
string
>VPropertypublicstring $search_option
type of db search like => 'sql like', regexp => 'sql regexp'
Details
Type
string
>VPropertypublicarray $search_result = array()
Array of article id's with information about cms-types, occurence of keyword/searchword, similarity .

..

Default valuearray()Details
Type
array
>VPropertypublicarray $search_words = array()
the search words
Default valuearray()Details
Type
array
>VPropertypublicarray $search_words_exclude = array()
the words which should be excluded from search
Default valuearray()Details
Type
array
>VPropertypublicarray $searchable_arts = array()
array of searchable articles
Default valuearray()Details
Type
array

Methods

methodpublicSearch(array $options,  $oDB = false) : void

Constructor

Parameters
NameTypeDescription
$optionsarray

$options['db'] 'regexp' => DB search with REGEXP; 'like' => DB search with LIKE; 'exact' => exact match; $options['combine'] 'and', 'or' Combination of search words with AND, OR $options['exclude'] 'true' => searchrange specified in 'cat_tree', 'categories' and 'articles' is excluded; 'false' => searchrange specified in 'cat_tree', 'categories' and 'articles' is included $options['cat_tree'] e.g. array(8) => The complete tree with root 8 is in/excluded from search $options['categories'] e.g. array(10, 12) => Categories 10, 12 in/excluded $options['articles'] e.g. array(23) => Article 33 in/excluded $options['artspecs'] => e.g. array(2, 3) => search only articles with certain article specifications $options['protected'] 'true' => do not search articles which are offline (locked) or articles in catgeories which are offline (protected) $options['dontshowofflinearticles'] 'false' => search offline articles or articles in categories which are offline $options['searchable_articles'] array of article ID's which should be searchable

$oDB
methodpublicaddArticleSpecificationsByName( $sArtSpecName) : void

Add all article specifications matching name of article specification (client dependent but language independent)

Parameters
NameTypeDescription
$sArtSpecName
methodpublicgetArticleSpecifications() : Array

Fetch all article specifications which are online

Returns
TypeDescription
Arrayof article specification Ids
methodpublicgetSearchableArticles( $search_range) : \Articles

Parameters
NameTypeDescription
$search_range
Returns
TypeDescription
\Articlesin specified search range
methodpublicgetSubTree( $cat_start) : \Category

Parameters
NameTypeDescription
$cat_start

Root of a category tree

Returns
TypeDescription
\CategoryTree
methodpublicsearchIndex(string $searchwords, string $searchwords_exclude = '') : void

indexed fulltext search

Parameters
NameTypeDescription
$searchwordsstring

The search words

$searchwords_excludestring

The words, which should be excluded from search

methodpublicsetArticleSpecification( $iArtspecID) : void

Set article specification

Parameters
NameTypeDescription
$iArtspecID
methodpublicsetCmsOptions( $cms_options) : void

Parameters
NameTypeDescription
$cms_options

The cms-types (htmlhead, html, ...) which should explicitly be searched

methodpublicstripWords( $searchwords) : Array

Parameters
NameTypeDescription
$searchwords

The search-words

Returns
TypeDescription
Arrayof stripped search-words

\SearchResult

Package: Default
Contenido API - SearchResult Object

This object ranks and displays the result of the indexed fulltext search. If you are not comfortable with this API feel free to use your own methods to display the search results. The search result is basically an array with article ID's.

If $search_result = $search->searchIndex($searchword, $searchwordex);

use object with

$oSearchResults = new SearchResult($search_result, 10);

$oSearchResults->setReplacement('', ''); // html-tags to emphasize the located searchwords

$num_res = $oSearchResults->getNumberOfResults(); $num_pages = $oSearchResults->getNumberOfPages(); $res_page = $oSearchResults->getSearchResultPage(1); // first result page foreach ($res_page as $key => $val) { $headline = $oSearchResults->getSearchContent($key, 'HTMLHEAD'); $first_headline = $headline[0]; $text = $oSearchResults->getSearchContent($key, 'HTML'); $first_text = $text[0]; $similarity = $oSearchResults->getSimilarity($key); $iOccurrence = $oSearchResults->getOccurrence($key);

}

Author
Willi Man  
Copyright
four for business AG  
Version
1.0.0  

Properties

>VPropertypublicboolean $bDebug
Debug option
Details
Type
boolean
>VPropertypublicarray $cfg
configuration settings
Details
Type
array
>VPropertypublicint $client
a contenido client
Details
Type
int
>VPropertypublicobject $db
Contenido database object
Details
Type
object
>VPropertypublicobject $index
Instance of class Index
Details
Type
object
>VPropertypublicint $lang
language of a client
Details
Type
int
>VPropertypublicarray $ordered_search_result = array()
Array of result-pages with array's of article id's
Default valuearray()Details
Type
array
>VPropertypublicint $pages
Number of result pages
Details
Type
int
>VPropertypublicarray $rank_structure = array()
Array of article id's with ranking information
Default valuearray()Details
Type
array
>VPropertypublicarray $replacement = array()
Array of html-tags to emphasize the searchwords
Default valuearray()Details
Type
array
>VPropertypublicint $result_page
Current result page
Details
Type
int
>VPropertypublicint $result_per_page
Results per page to display
Details
Type
int
>VPropertypublicint $results
Number of results
Details
Type
int
>VPropertypublicarray $search_result = array()
Array of article id's with information about cms-types, occurence of keyword/searchword, similarity .

..

Default valuearray()Details
Type
array

Methods

methodpublicSearchResult( $search_result,  $result_per_page,  $oDB = false,  $bDebug = false) : void

Compute ranking factor for each search result and order the search results by ranking factor NOTE: The ranking factor is the sum of occurences of matching searchterms weighted by similarity (in %) between searchword and matching word in the article.

TODO: One can think of more sophisticated ranking strategies. One could use the content type information for example because a matching word in the headline (CMS_HEADLINE[1]) could be weighted more than a matching word in the text (CMS_HTML[1]).

Parameters
NameTypeDescription
$search_result
$result_per_page
$oDB
$bDebug
methodpublicgetArtCat( $artid) : \Category

Parameters
NameTypeDescription
$artid
Returns
TypeDescription
\CategoryId
methodpublicgetContent( $art_id,  $cms_type,  $id = 0) : \Content

Parameters
NameTypeDescription
$art_id

Id of an article

$cms_type
$id
Returns
TypeDescription
\Contentof an article, specified by it's content type
methodpublicgetNumberOfPages() : \Number

Returns
TypeDescription
\Numberof result pages
methodpublicgetNumberOfResults() : \Number

Returns
TypeDescription
\Numberof articles in search result
methodpublicgetOccurrence( $art_id) : \Number

Parameters
NameTypeDescription
$art_id

Id of an article

Returns
TypeDescription
\Numberof matching searchwords found in article
methodpublicgetSearchContent( $art_id,  $cms_type,  $cms_nr = NULL) : \Content

Parameters
NameTypeDescription
$art_id

Id of an article

$cms_type

Content type

$cms_nr
Returns
TypeDescription
\Contentof an article in search result, specified by its type
methodpublicgetSearchResultPage( $page_id) : \Artices

Parameters
NameTypeDescription
$page_id
Returns
TypeDescription
\Articesin page $page_id
methodpublicgetSimilarity( $art_id) : \Similarity

Parameters
NameTypeDescription
$art_id

Id of an article

Returns
TypeDescription
\Similaritybetween searchword and matching word in article
methodpublicsetOrderedSearchResult( $ranked_search,  $result_per_page) : void

Parameters
NameTypeDescription
$ranked_search
$result_per_page
methodpublicsetReplacement(string $rep1, string $rep2) : void

Parameters
NameTypeDescription
$rep1string

The opening html-tag to emphasize the searchword e.g. ''

$rep2string

The closing html-tag e.g. ''

\Search_helper

Package: Default
Deprecated
 
Since
2008-07-11  

Properties

>VPropertypublic$oDb = NULL
Default valueNULLDetails
Type
n/a

Methods

methodpublicsearch_helper( $oDb,  $lang,  $client) : void

Parameters
NameTypeDescription
$oDb
$lang
$client
Documentation was generated by phpDocumentor 2.0.0a12.