Коллекция модулей языка Python для data science и аналитики.
1. Pandas
Pandas is a library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. Pandas is free software released under the three-clause BSD license.
Website: http://pandas.pydata.org/
Installation:
Installing pandas and the rest of the NumPy and SciPy stack can be a little difficult for inexperienced users.
The easiest way to install pandas is to install it as part of the Anaconda distribution.
pandas can be installed via pip from PyPI.
1 |
pip install pandas |
This will likely require the installation of a number of dependencies, including NumPy, will require a compiler to compile required bits of code, and can take a few minutes to complete.
2. Statsmodels
Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.
Website: http://statsmodels.sourceforge.net/
Installation
You can obtain source distributions and Windows binaries from PyPi. Alternatively, you can use setuptools to install statsmodels:
1 |
easy_install statsmodels |
or upgrade with:
1 |
easy_install -U statsmodels |
Statsmodels can be installed from source the usual way with the command
1 |
python setup.py install |
3. scikit-learn
scikit-learn is an open source library for the Python. It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
Website: http://scikit-learn.org/stable/
Installation
At this time scikit-learn does not provide official binary packages for Linux so you have to build from source.
Installing from source requires you to have installed the scikit-learn runtime dependencies, Python development headers and a working C/C++ compiler. Under Debian-based operating systems, which include Ubuntu, if you have Python 2 you can install all these requirements by issuing:
1 2 3 4 5 |
sudo apt-get install build-essential python-dev python-setuptools \ python-numpy python-scipy \ libatlas-dev libatlas3gf-base |
4. Mlpy
Mlpy is a Python machine learning library built on top of NumPy/SciPy, the GNU Scientific Library. mlpy provides a wide range of machine learning methods for supervised and unsupervised problem.mlpy is multi platform, it works with Python 2 and 3.
Website: http://mlpy.sourceforge.net/
Installation
Download latest version for your OS from http://sourceforge.net/projects/mlpy/files/
you need GCC, Python, Numpy, SciPy, GSL preinstalled
then, from the terminal run
1 |
python setup.py install |
5. NumPy
NumPy is an open source extension module for Python. The module NumPy provides fast precompiled functions for numerical routines.
It adds support to Python for large, multi-dimensional arrays and matrices. Besides that it supplies a large library of high-level mathematical functions to operate on these arrays
Website: http://www.numpy.org/
Installation
Most of the major linux distributions provide packages for NumPy, but these can lag behind the most recent NumPy release. Pre-built binary packages for Ubuntu are available on the scipy ppa. Redhat binaries are available in the Enthought Canopy.
6. SciPy
SciPy is widely used in scientific and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.
Website: http://www.scipy.org/
Installation
Users on Linux can quickly install the necessary packages from repositories.
for example ubuntu users can install dependencied by runnung
1 |
sudo apt-get install python-numpy python-scipy python-matplotlib ipython ipython-notebook python-pandas python-sympy python-nose |
7. matplotlib
matplotlib is a plotting library for NumPy.
Website: http://matplotlib.org/
Installation
1 |
sudo apt-get install python-matplotlib |
8. NLTK
The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs statistical natural language processing (NLP) for the Python. NLTK includes graphical demonstrations and sample data.NLTK has been used successfully as a platform for prototyping and building research systems
Website: http://www.nltk.org/
Installation
for ubuntu
1 2 3 |
sudo pip install -U nump sudo pip install -U nltk |
9. Theano
Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently
Website: http://deeplearning.net/software/theano/
10. nolearn
This package contains a number of utility modules that are helpful with machine learning tasks. Most of the modules work together with scikit-learn, others are more generally useful.
https://pythonhosted.org/nolearn/
11. PyBrain
PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.
12. Orange
Orange is a component-based data mining and machine learning software suite, featuring a visual programming front-end for explorative data analysis and visualization, and Python bindings and libraries for scripting. It includes a set of components for data preprocessing, feature scoring and filtering, modeling, model evaluation, and exploration techniques. It is implemented in C++ and Python. Its graphical user interface builds upon the cross-platform Qt framework.
Unlike its competitors scikit-learn and mlpy, Orange does not tie into NumPy and its ecosystem of tools; it focuses on traditional, symbolic algorithms, more than numeric ones.
13. Keras
Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python, that uses Theano under the hood for fast tensor manipulation on GPU and CPU. It was developed with a focus on enabling fast experimentation.
14. Hebel
Hebel is a library for deep learning with neural networks in Python using GPU acceleration with CUDA through PyCUDA. It implements the most important types of neural network models and offers a variety of different activation functions and training methods such as momentum, Nesterov momentum, dropout, and early stopping.