KenLM for Developers

Up to the main page
benchmark | dependencies | developers | estimation | filter | moses | structures

Download the source.

Integrating

  1. Copy kenlm into your source tree. Distributing with your decoder is encouraged. LICENSE
  2. Omit the lm/filter, lm/builder, and util/stream directories if you only want query support. Omit python if you don't use Python.
  3. If using your own build system (recommended), delete windows and reimplement compile_query_only.sh (for queries) or the CMakeLists.txt files (for everything).
  4. Choose Boost, ICU, zlib, bzip2, and lzma support. See README.md in the source.
  5. Code against the interface in the next section.
  6. If your system does not generate hypotheses left-to-right, see lm/left.hh for a higher-level interface with left state minimization.

Interface Example

The interface is designed for efficient use inside a decoder:
#include "lm/model.hh"
#include <iostream>
#include <string>
int main() {
  using namespace lm::ngram;
  Model model("file.arpa");
  State state(model.BeginSentenceState()), out_state;
  const Vocabulary &vocab = model.GetVocabulary();
  std::string word;
  while (std::cin >> word) {
    std::cout << model.Score(state, vocab.Index(word), out_state) << '\n';
    state = out_state;
  }
}
Keeping state is recommended for speed, but not required.

More Documentation

Public APIs appear in lm/virtual_interface.hh and lm/model.hh. A paragraph documents each call.