Building an association network via literature search

 

To execute a search, you enter a set of search terms and, optionally, context words.  For each search term entered, a query line is constructed in the Query Editor panel, incorporating aliases and context, if desired.   Some definitions are in order:

 

Term

A term is a concept that you want to search for.  This is typically a biological entity, such as a protein, gene, or metabolite, but it can also be a biological process, a disease, or any other construct you are interested in.  Terms are entered in the Terms panel of Agilent Literature Search.

 

Context

Context is used to refine and narrow the scope of the search, to enable Agilent Literature Search automatically sift through the potentially large corpus of search results to quickly identify information of interest to you.   Context can include any words of interest to you, typically a biological process, molecular function, or disease.    Context words are entered in the  Context panel of Agilent Literature Search. 

 

Query String

Query strings are the actual queries  that are sent to the search engines.  Query strings are built up in the Query Editor panel of Agilent Literature Search as terms and context words are added in the Terms and Context panels.   For each line of terms in the Terms panel a query string is built up in the Query Editor

If multiple terms appear on the same line (a phrase), Agilent Literature Search passes the terms to the search engines without interpretation.  The interpretation of the phrase is left to the semantics of the specific search engines.   For example, the terms

beta catenin

when entered in the Terms panel with aliasing turned off, will be sent as

beta catenin

to the search engines.  PubMed and OMIM will handle such phrases.  However, USPTO requires that multi-word phrases be in quotes or be separated by boolean terms, like AND, OR, etc.  We recommend using quotes around multi-word phrases, e.g.

"beta catenin"

Agilent Literature Search provides as a utility a set of organism-specific aliases, contained in the concept lexicon files.  When terms are entered with aliasing turned on, the aliases are inserted into the query string in a disjunctive manner.   For example, the term nfkb, when entered in the Terms panel with aliasing turned on, will generate the query string

("nf-kappa b" OR nfkb-1 OR nfkappab OR nfkb OR nf-kappab OR "nfkappa b")

Note that multi-word aliases are automatically enclosed in quotes.

Each line of terms in the Terms panel results in one query string.  If context is used, then each line of the Context panel is combined with each query string.  Multiple context lines are combined in a disjunctive (OR) fashion.  For example, the query string "beta catenin", when combined with the following contents of Context panel

melanoma

cancer

results in the query string

("beta catenin") AND (melanoma OR cancer)

in this case, either the combination of "beta catenin" and "melanoma" or "beta catenin" and "cancer" would, when found, constitute a match.

If multiple context words appear on the same line in the Context panel, they will sent as is to the search engines.   For example, the query string "beta-catenin", when combined with the following contents of Context panel

melanoma cancer

results in the query string

("beta catenin") AND melanoma cancer

Note that the above query will not work in USPTO because of the multi-word term.  We recommend that all multi-word context lines be either enclosed in quotes or separated by boolean terms, such as AND, OR, etc.  For example, the Context panel could contain

melanoma AND cancer

resulting in the query string

("beta catenin") AND melanoma AND cancer

You can also manually edit the query lines in the Query Editor to specify more advanced search options, such as an "author" field.  

For PubMed, you can set the author or journal title fields in the query as Context.  For example,

nature [ta]

specifies that the journal title should contain the text "nature".  For further examples, please consult

http://www.ncbi.nlm.nih.gov/entrez/query/static/help/pmhelp.html

 

 

 

Search

The Search consists of the combination of all query strings in the Query Editor.  When you press the  Fetch button, the set of queries in the Query Editor is submitted to multiple user-selected search engines.  The retrieved results (documents) are fetched from their respective sources and each document is then parsed into sentences and analyzed for concept associations (e.g. protein-protein associations). Agilent Literature Search uses a set of lexicons for defining concept names (and aliases) and association terms (verbs) of interest. The concept lexicon supplied with this version of Agilent Literature Search contains gene/protein names and aliases.  An association is extracted for every sentence containing at least two concept names and one verb.  Associations  are then converted into interactions, which are further grouped into a network. The sentences and source hyperlinks for each association are further stored as attributes of the corresponding interactions .