## A Theory of Indexing by Gerard Salton By Gerard Salton

Offers a concept of indexing able to rating index phrases, or topic identifiers in lowering order of significance. This results in the alternative of fine record representations, and likewise debts for the function of words and of glossary sessions within the indexing procedure.

This examine is regular of theoretical paintings in computerized details association and retrieval, in that techniques are used from arithmetic, desktop technology, and linguistics. an entire idea of details retrieval may possibly emerge from a suitable mixture of those 3 disciplines.

Similar probability books

Stochastic optimal control: the discrete time case

This study monograph is the authoritative and accomplished remedy of the mathematical foundations of stochastic optimum keep watch over of discrete-time structures, together with the therapy of the tricky measure-theoretic matters.

Extra info for A Theory of Indexing

Example text

Additions or subtractions, square roots. The final operational complexity for t computations of Qk - Q is then (2Kn + 4n + 2)t + 2Kn + 2n multiplications or divisions, (2Kn + n + 3)f + 2Kn + n additions or subtractions, and (n + \)t square roots. A summarization of the complexity of the significance computations is given in Table 6. Since the discrimination value measure is dependent on the collection G. SALTON 26 TABLE 6 Computational complexity of significance computations Significance Overall order Computa tional requirements measure F or B (multiplications) K't additions EK (2K' + l)t (K1 + 2)t additions multiplications S/N (2K' + l)t 3K't 2K't additions multiplications logarithms o(3K't) (2Kn + 4» + 2)t + 2Kn + 2n multiplications (2Kn + n -f 3)t + 2Kn + n additions (n + \)t square roots o(2Knt) DV — o(K't) size, the calculations become automatically much more demanding than those required for the other measures.

Term freq. weights /* B. Term freq. with IDF B. Term freq. with IDF (/? 0000 Table 10 contains t-test and Wilcoxon signed rank test values, giving in each case the probability that the output results for the two test runs could have been generated from the same distribution of values. 05—indicate that the answer to this question is negative and that the test results are significantly different . It may be seen in Table 10 that only 30 G. SALTON for the Time collection is there a significant difference between binary and term frequency weighting, with the latter being substantially better than the former (B > A).

Standard /* run vs. B. SPT phrases from discriminators A. Standard /J run vs. B. Combined PT + SPT phrases A. ft • IDF weights vs. B. 01 (A> B) a thesaurus class, the class will exhibit a much higher document frequency, and most likely a better discrimination value, than any of the original terms. There exist well-known procedures for constructing thesauruses either manually or automatically , , . In the latter case, automatic term classification methods may be used to generate the appropriate term groups .