Как оценивать модели?

In today’s Digital age,  insights received from data science are extremely important to deliver the best customer experience.

Data Scientists use various techniques such as Regression, SVM, Neural network, Nearest neighbor, Naive Bayes, Decision Tree and Ensemble models.

These algorithms help to identify previously unrecognized patterns and trends hidden within vast amounts of structured and unstructured information. These patterns are used to create predictive models that try to forecast future behavior.

These models have many practical business applications: predicting patients at risk, they help banks decide which customers to approve for loans, and marketers use them to determine which leads to target with campaigns.

But how to determine if the predictive models you create are accurate, meaningful representations that will prove valuable to your organization?

There are various methods used by data scientists to measure the accuracy of the model:

  • Lift Charts & Gain Charts: These are widely used in campaign targeting problems, to determine which decile can we target customers for a specific campaign. Also, it tells you how much response you can expect from the new target base.
  • ROC Curve: The ROC curve is the plot between false positive rate and True Positive rate.
  • Gini coefficient: This is the ratio of area between the ROC curve and the diagonal line & the area of the above triangle
  • Cross Validation: splitting the data into two parts, where one part is used for «training» your model, and the second part is used to make predictions. By this you can test the model on the data that was «not seen» by it previously, and check how it could possibly behave with external data.
  • Confusion Matrix: A table showing the number of predictions for each class compared to the number of instances that actually belong to each class. This is very useful to get an overview of the types of mistakes the algorithm made. This method shows accuracy, true positive, false positive, Sensitivity & specificity of the model.
  • Root Mean Squared Error: This is the average amount of error made on the test set in the units of the output variable. This measure helps you get an idea on the amount a given prediction may be wrong on average. This is most popular in regression techniques.

In general, the assessment used should be closely matching the business objectives. Using the right metric can have more influence on you model performance than the algorithm you use.

Source

Data Scientist # 1

Машинное обучение, большие данные, наука о данных, анализ данных, цифровой маркетинг, искусственный интеллект, нейронные сети, глубокое обучение, data science, data scientist, machine learning, artificial intelligence, big data, deep learning

Данные — новый актив!

Эффективно управлять можно только тем, что можно измерить.
Copyright © 2020 Data Scientist. Все права защищены.