Installing gensim on windows for topic modelling

If you ever walked trough machine learning landscape in Python, you definitely encountered many different libraries. One of the most useful is Radim Řehůřek‘s open source library named gensim: topic modelling for humans. The subtitle of this library is a little bit inaccurate. It should be rewritten as “topic modelling for certain kind of humans, that with the strange deep look“.

This topic modelling library is building on some other libraries with effective data structures. Dependencies for this library are:

  • Python (3+) – scripting programming language
  • NumPy – package for scientific computing, especially multidimensional arrays
  • SciPy – set of computational libraries

Detailed tutorial for installing and starting with gensim is described at gensim webpages.

Dealing with “no lapack/blas resources found” error

We have encountered “no lapack/blas resources found” error during the installation on Windows operating system We are using version Windows Server 2016.

File “scipy\linalg\setup.py”, line 20, in configuration
raise NotFoundError(‘no lapack/blas resources found’)
numpy.distutils.system_info.NotFoundError: no lapack/blas resources found

This error have solution based on downloading compilled binaries from http://www.lfd.uci.edu/~gohlke/pythonlibs and installing them in specific order.

  1. Install Numpy+MKL – download Numpy+MKL wheel from http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy and install it.
    e.g. pip install "numpy-1.12.1+mkl-cp35-cp35m-win_amd64.whl"
  2. Install SciPy – download SciPy wheel from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy and install it.
    e.g. pip install "scipy-0.19.0-cp35-cp35m-win_amd64.whl"
  3. Install gensim – now you can finally install it from default package index.
    e.g. pip install gensim

If you have “not sufficient rights” error, run installation from python process instead of pip:

python -m pip install "filename"

You must have Python already installed (in this case version 3.5). We recommended to install 64bit versions for obvious reasons. Memory address space and computing power are important. There are also other topic modelling libraries like Scikit Learn. In fact, you can use any package implementing latent Dirichlet allocation (LDA) algorithm such as lda. This approach to topic modelling is widely popular and still quite effective. Library Gensim implements also other approaches this task.

Spread the word...Email this to someoneTweet about this on TwitterShare on FacebookShare on Google+Share on LinkedInShare on Tumblr
Posted in IT Tagged with: , ,

Leave a Reply

Your email address will not be published. Required fields are marked *

*