Defining a new search engine

LitSearch can perform its search via several different search engines, which we refer to as bots.  Examples of bots are PubMed and OMIM.  To add a new search engine, or modify an existing search engine, you need to edit the Agilent Literature Search settings file.    Agilent Literature Search settings are contained, in XML format, in a file named litsearch-settings.xml, which is located under the folder <user.home>/.litsearch/data/.  A search engine can be configured via the bot tag in the litsearch-settings.xml file.  The LitSearch plugin comes installed with the PubMed bot configured and enabled and the OMIM and USPTO bots configured but disabled.  

To define a new bot, you need to add a <bot> tag to the litsearch-settings.xml that specifies a number of search attributes that your search engine expects.  These attributes vary from search engine to search engine and you will need to identify and supply the attribute values that your desired search engine is expecting. 

The following is an example of the bot definition for the US Patent and Trademark Office (USPTO), which Agilent Literature Search provides as one of its default search engines.

<bot displayname="USPTO"

         prefix=http://patft.uspto.gov

        query_delim="+"

        query_hits="&amp;l="

        query_url="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&amp;Sect2=HITOFF&amp;u=%2Fnetahtml%2Fsearch-adv.htm&amp;r=0&amp;p=1&amp;f=S&amp;d=pall&amp;Query="

        url_regexp= "& lt;A HREF=(.*?)>(.*?)&lt;/A>.*?/netaicon/PTO/ftext.gif.*?&lt;A HREF=.*?>(.*?)&lt;/A>&lt;/TD>"

        url_show="no"   />

 

The attributes of the bot tag are defined as follows

 
<TD valign=top><A  HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-adv.htm&r=1&p=1&f=G&l=50&d=ptxt&S1=melanoma&OS=melanoma&RS=melanoma> 6,861,564</A></TD>
 
<TD valign=baseline><IMG border=0 src="/netaicon/PTO/ftext.gif" alt="Full-Text"></TD>
<TD valign=top><A  HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-adv.htm&r=1&p=1&f=G&l=50&d=ptxt&S1=melanoma&OS=melanoma&RS=melanoma> Process for preparing resorcinol derivatives
</A></TD>

The figure below shows the index page returned by USPTO which includes the HTML for the regular expression discussed above.

 
 
The figure below shows this information displayed in the Query Matches panel of Agilent Literature Search.