By Gerard Salton
Offers a concept of indexing able to rating index phrases, or topic identifiers in lowering order of significance. This results in the alternative of fine record representations, and likewise debts for the function of words and of glossary sessions within the indexing procedure.
This examine is regular of theoretical paintings in computerized details association and retrieval, in that techniques are used from arithmetic, desktop technology, and linguistics. an entire idea of details retrieval may possibly emerge from a suitable mixture of those 3 disciplines.
Read or Download A Theory of Indexing PDF
Similar probability books
This study monograph is the authoritative and accomplished remedy of the mathematical foundations of stochastic optimum keep watch over of discrete-time structures, together with the therapy of the tricky measure-theoretic matters.
- Probability and Analysis: Lectures given at the 1st 1985 Session of the Centro Internazionale Matematico Estivo (C.I.M.E.) held at Varenna (Como), Italy May 31 – June 8, 1985
- Statistical Tolerance Regions: Theory, Applications, and Computation (Wiley Series in Probability and Statistics)
- Probability: An Introduction (Oxford Science Publications)
- Seminaire De Probabilities XXX
Extra info for A Theory of Indexing
Additions or subtractions, square roots. The final operational complexity for t computations of Qk - Q is then (2Kn + 4n + 2)t + 2Kn + 2n multiplications or divisions, (2Kn + n + 3)f + 2Kn + n additions or subtractions, and (n + \)t square roots. A summarization of the complexity of the significance computations is given in Table 6. Since the discrimination value measure is dependent on the collection G. SALTON 26 TABLE 6 Computational complexity of significance computations Significance Overall order Computa tional requirements measure F or B (multiplications) K't additions EK (2K' + l)t (K1 + 2)t additions multiplications S/N (2K' + l)t 3K't 2K't additions multiplications logarithms o(3K't) (2Kn + 4» + 2)t + 2Kn + 2n multiplications (2Kn + n -f 3)t + 2Kn + n additions (n + \)t square roots o(2Knt) DV — o(K't) size, the calculations become automatically much more demanding than those required for the other measures.
Term freq. weights /* B. Term freq. with IDF B. Term freq. with IDF (/? 0000 Table 10 contains t-test and Wilcoxon signed rank test values, giving in each case the probability that the output results for the two test runs could have been generated from the same distribution of values. 05—indicate that the answer to this question is negative and that the test results are significantly different . It may be seen in Table 10 that only 30 G. SALTON for the Time collection is there a significant difference between binary and term frequency weighting, with the latter being substantially better than the former (B > A).
Standard /* run vs. B. SPT phrases from discriminators A. Standard /J run vs. B. Combined PT + SPT phrases A. ft • IDF weights vs. B. 01 (A> B) a thesaurus class, the class will exhibit a much higher document frequency, and most likely a better discrimination value, than any of the original terms. There exist well-known procedures for constructing thesauruses either manually or automatically , , . In the latter case, automatic term classification methods may be used to generate the appropriate term groups .