Модули Python для Data Science

Коллекция модулей языка Python для data science и аналитики.

1. Pandas

Pandas is a library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series. Pandas is free software released under the three-clause BSD license.

Website: http://pandas.pydata.org/

Installation:

Installing pandas and the rest of the NumPy and SciPy stack can be a little difficult for inexperienced users.

The easiest way to install pandas is to install it as part of the Anaconda distribution.

pandas can be installed via pip from PyPI.

This will likely require the installation of a number of dependencies, including NumPy, will require a compiler to compile required bits of code, and can take a few minutes to complete.

2. Statsmodels

Statsmodels is a Python module that allows users to explore data, estimate statistical models, and perform statistical tests. An extensive list of descriptive statistics, statistical tests, plotting functions, and result statistics are available for different types of data and each estimator.

Website: http://statsmodels.sourceforge.net/

Installation

You can obtain source distributions and Windows binaries from PyPi. Alternatively, you can use setuptools to install statsmodels:

or upgrade with:

Statsmodels can be installed from source the usual way with the command

3. scikit-learn

scikit-learn is an open source library for the Python. It features various classification, regression and clustering algorithms including support vector machines, logistic regression, naive Bayes, random forests, gradient boosting, k-means and DBSCAN, and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.

Website: http://scikit-learn.org/stable/

Installation

At this time scikit-learn does not provide official binary packages for Linux so you have to build from source.

Installing from source requires you to have installed the scikit-learn runtime dependencies, Python development headers and a working C/C++ compiler. Under Debian-based operating systems, which include Ubuntu, if you have Python 2 you can install all these requirements by issuing:

4. Mlpy

Mlpy is a Python machine learning library built on top of NumPy/SciPy, the GNU Scientific Library. mlpy provides a wide range of machine learning methods for supervised and unsupervised problem.mlpy is multi platform, it works with Python 2 and 3.

Website: http://mlpy.sourceforge.net/

Installation

Download latest version for your OS from http://sourceforge.net/projects/mlpy/files/

you need GCC, Python, Numpy, SciPy, GSL preinstalled

then, from the terminal run

5. NumPy

NumPy is an open source extension module for Python. The module NumPy provides fast precompiled functions for numerical routines.

It adds support to Python for large, multi-dimensional arrays and matrices. Besides that it supplies a large library of high-level mathematical functions to operate on these arrays

Website: http://www.numpy.org/

Installation

Most of the major linux distributions provide packages for NumPy, but these can lag behind the most recent NumPy release. Pre-built binary packages for Ubuntu are available on the scipy ppa. Redhat binaries are available in the Enthought Canopy.

6. SciPy

SciPy is widely used in scientific and technical computing. SciPy contains modules for optimization, linear algebra, integration, interpolation, special functions, FFT, signal and image processing, ODE solvers and other tasks common in science and engineering.

Website: http://www.scipy.org/

Installation

Users on Linux can quickly install the necessary packages from repositories.

for example ubuntu users can install dependencied by runnung

7. matplotlib

matplotlib is a plotting library for NumPy.

Website: http://matplotlib.org/

Installation

8. NLTK

The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs statistical natural language processing (NLP) for the Python. NLTK includes graphical demonstrations and sample data.NLTK has been used successfully as a platform for prototyping and building research systems

Website: http://www.nltk.org/

Installation

for ubuntu

9. Theano

Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently

Website: http://deeplearning.net/software/theano/

10. nolearn

This package contains a number of utility modules that are helpful with machine learning tasks. Most of the modules work together with scikit-learn, others are more generally useful.

https://pythonhosted.org/nolearn/

11. PyBrain

PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms.

http://pybrain.org/

12. Orange

Orange is a component-based data mining and machine learning software suite, featuring a visual programming front-end for explorative data analysis and visualization, and Python bindings and libraries for scripting. It includes a set of components for data preprocessing, feature scoring and filtering, modeling, model evaluation, and exploration techniques. It is implemented in C++ and Python. Its graphical user interface builds upon the cross-platform Qt framework.

Unlike its competitors scikit-learn and mlpy, Orange does not tie into NumPy and its ecosystem of tools; it focuses on traditional, symbolic algorithms, more than numeric ones.

http://orange.biolab.si/

13. Keras

Keras is a minimalist, highly modular neural network library in the spirit of Torch, written in Python, that uses Theano under the hood for fast tensor manipulation on GPU and CPU. It was developed with a focus on enabling fast experimentation.

http://keras.io/

14. Hebel

Hebel is a library for deep learning with neural networks in Python using GPU acceleration with CUDA through PyCUDA. It implements the most important types of neural network models and offers a variety of different activation functions and training methods such as momentum, Nesterov momentum, dropout, and early stopping.

https://github.com/hannes-brt/hebel

Source

Data Scientist # 1

Машинное обучение, большие данные, наука о данных, анализ данных, цифровой маркетинг, искусственный интеллект, нейронные сети, глубокое обучение, data science, data scientist, machine learning, artificial intelligence, big data, deep learning

Данные — новый актив!

Эффективно управлять можно только тем, что можно измерить.
Copyright © 2016-2021 Data Scientist. Все права защищены.