If you ever walked trough machine learning landscape in Python, you definitely encountered many different libraries. One of the most useful is Radim Řehůřek‘s open source library named gensim: topic modelling for humans. The subtitle of this library is a little bit inaccurate. It should be rewritten as “topic modelling for certain kind of humans, that with the strange deep look“.
This topic modelling library is building on some other libraries with effective data structures. Dependencies for this library are:
- Python (3+) – scripting programming language
- NumPy – package for scientific computing, especially multidimensional arrays
- SciPy – set of computational libraries
Detailed tutorial for installing and starting with gensim is described at gensim webpages.
Dealing with “no lapack/blas resources found” error
We have encountered “no lapack/blas resources found” error during the installation on Windows operating system We are using version Windows Server 2016.
File “scipy\linalg\setup.py”, line 20, in configuration
raise NotFoundError(‘no lapack/blas resources found’)
NotFoundError: no lapack/blas resources found
This error have solution based on downloading compilled binaries from http://www.lfd.uci.edu/~gohlke/pythonlibs and installing them in specific order.
- Install Numpy+MKL – download Numpy+MKL wheel from http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy and install it.
pip install "numpy-1.12.1+mkl-cp35-cp35m-win_amd64.whl"
- Install SciPy – download SciPy wheel from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy and install it.
pip install "scipy-0.19.0-cp35-cp35m-win_amd64.whl"
- Install gensim – now you can finally install it from default package index.
pip install gensim
If you have “not sufficient rights” error, run installation from python process instead of pip:
python -m pip install "filename"
You must have Python already installed (in this case version 3.5). We recommended to install 64bit versions for obvious reasons. Memory address space and computing power are important. There are also other topic modelling libraries like Scikit Learn. In fact, you can use any package implementing latent Dirichlet allocation (LDA) algorithm such as lda. This approach to topic modelling is widely popular and still quite effective. Library Gensim implements also other approaches this task.