Amazon cover image
Image from Amazon.com

Scikit-learn Cookbook : over 50 recipes to incorporate scikit-learn into every step of the data science pipeline, from feature extraction to model building and model evaluation / Trent Hauck.

By: Material type: TextTextPublisher: Birmingham, U.K. : Packt Publishing, 2014Description: 1 online resource (1 volume) : illustrationsContent type:
  • text
Media type:
  • computer
Carrier type:
  • online resource
ISBN:
  • 9781783989492
  • 1783989491
Subject(s): Genre/Form: Additional physical formats: Print version:: Scikit-learn cookbook : over 50 recipes to incorporate scikit-learn into every step of the data science pipeline, from feature extraction to model builing and model evaluation.DDC classification:
  • 641.5 23
LOC classification:
  • Q325.5
Online resources:
Contents:
Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Premodel Workflow; Introduction; Getting sample data from external sources; Creating sample data for toy analysis; Scaling data to the standard normal; Creating binary features through thresholding; Working with categorical variables; Binarizing label features; Imputing missing values through various strategies; Using Pipelines for multiple preprocessing steps; Reducing dimensionality with PCA; Using factor analysis for decomposition
Kernel PCA for nonlinear dimensionality reductionUsing truncated SVD to reduce dimensionality; Decomposition to classify with DictionaryLearning; Putting it all together with Pipelines; Using Gaussian processes for regression; Defining the Gaussian process object directly; Using stochastic gradient descent for regression; Chapter 2: Working with Linear Models; Introduction; Fitting a line through data; Evaluating the linear regression model; Using ridge regression to overcome linear regression's shortfalls; Optimizing the ridge regression parameter; Using sparsity to regularize models
Taking a more fundamental approach to regularization with LARSUsing linear methods for classification -- logistic regression; Directly applying Bayesian ridge regression; Using boosting to learn from errors; Chapter 3: Building Models with Distance Metrics; Introduction; Using KMeans to cluster data; Optimizing the number of centroids; Assessing cluster correctness; Using MiniBatch KMeans to handle more data; Quantizing an image with KMeans clustering; Finding the closest objects in the feature space; Probabilistic clustering with Gaussian Mixture Models; Using KMeans for outlier detection
Using k-NN for regressionChapter 4: Classifying Data with scikit-learn; Introduction; Doing basic classifications with Decision Trees; Tuning a Decision Tree model; Using many Decision Trees -- random forests; Tuning a random forest model; Classifying data with Support Vector Machines; Generalizing with multiclass classification; Using LDA for classification; Working with QDA -- a nonlinear LDA; Using Stochastic Gradient Descent for classification; Classifying documents with Naïve Bayes; Label propagation with semi-supervised learning; Chapter 5: Post-model Workflow; Introduction
K-fold cross validationAutomatic cross validation; Cross validation with ShuffleSplit; Stratified k-fold; Poor man's grid search; Brute force grid search; Using dummy estimators to compare results; Regression model evaluation; Feature selection; Feature selection on L1 norms; Persisting models with joblib; Index
Summary: If you're a data scientist already familiar with Python but not Scikit-Learn, or are familiar with other programming languages like R and want to take the plunge with the gold standard of Python machine learning libraries, then this is the book for you.
Item type:
Tags from this library: No tags from this library for this title. Log in to add tags.
Star ratings
    Average rating: 0.0 (0 votes)
Holdings
Item type Home library Collection Call number Materials specified Status Date due Barcode
Electronic-Books Electronic-Books OPJGU Sonepat- Campus E-Books EBSCO Available

"Quick answers to common problems."

Online resource; title from cover (Safari, viewed November 17, 2014).

Includes index.

Cover; Copyright; Credits; About the Author; About the Reviewers; www.PacktPub.com; Table of Contents; Preface; Chapter 1: Premodel Workflow; Introduction; Getting sample data from external sources; Creating sample data for toy analysis; Scaling data to the standard normal; Creating binary features through thresholding; Working with categorical variables; Binarizing label features; Imputing missing values through various strategies; Using Pipelines for multiple preprocessing steps; Reducing dimensionality with PCA; Using factor analysis for decomposition

Kernel PCA for nonlinear dimensionality reductionUsing truncated SVD to reduce dimensionality; Decomposition to classify with DictionaryLearning; Putting it all together with Pipelines; Using Gaussian processes for regression; Defining the Gaussian process object directly; Using stochastic gradient descent for regression; Chapter 2: Working with Linear Models; Introduction; Fitting a line through data; Evaluating the linear regression model; Using ridge regression to overcome linear regression's shortfalls; Optimizing the ridge regression parameter; Using sparsity to regularize models

Taking a more fundamental approach to regularization with LARSUsing linear methods for classification -- logistic regression; Directly applying Bayesian ridge regression; Using boosting to learn from errors; Chapter 3: Building Models with Distance Metrics; Introduction; Using KMeans to cluster data; Optimizing the number of centroids; Assessing cluster correctness; Using MiniBatch KMeans to handle more data; Quantizing an image with KMeans clustering; Finding the closest objects in the feature space; Probabilistic clustering with Gaussian Mixture Models; Using KMeans for outlier detection

Using k-NN for regressionChapter 4: Classifying Data with scikit-learn; Introduction; Doing basic classifications with Decision Trees; Tuning a Decision Tree model; Using many Decision Trees -- random forests; Tuning a random forest model; Classifying data with Support Vector Machines; Generalizing with multiclass classification; Using LDA for classification; Working with QDA -- a nonlinear LDA; Using Stochastic Gradient Descent for classification; Classifying documents with Naïve Bayes; Label propagation with semi-supervised learning; Chapter 5: Post-model Workflow; Introduction

K-fold cross validationAutomatic cross validation; Cross validation with ShuffleSplit; Stratified k-fold; Poor man's grid search; Brute force grid search; Using dummy estimators to compare results; Regression model evaluation; Feature selection; Feature selection on L1 norms; Persisting models with joblib; Index

If you're a data scientist already familiar with Python but not Scikit-Learn, or are familiar with other programming languages like R and want to take the plunge with the gold standard of Python machine learning libraries, then this is the book for you.

eBooks on EBSCOhost EBSCO eBook Subscription Academic Collection - Worldwide

There are no comments on this title.

to post a comment.

O.P. Jindal Global University, Sonepat-Narela Road, Sonepat, Haryana (India) - 131001

Send your feedback to glus@jgu.edu.in

Hosted, Implemented & Customized by: BestBookBuddies   |   Maintained by: Global Library