Viviana Márquez
http://vivianamarquez.com
• Learn what Logistic Regression is.
• When should you use Logistic Regression.
• Build a machine learning model on a real-world application in Python.
• Linear regression is used to predict/forecast values (continuous values)
• Logistic regression is used classification tasks (discrete values: yes/no, dead/alive, pass/fail, ham/spam)
where $y$ is the dependent variable and $x_1,x_2,...,x_n$ are the explanatory variables.
• In Linear Regression, the predicted value can be anywhere between $-\infty$ to $\infty$.
• For Logistic Regression, we need the values to be between 0 and 1.
• Applying the Sigmoid function on linear regression, we obtain logistic regression:
import pandas as pd
# Load data
data = pd.read_csv("Pokemon.csv")
# Clean data
filter_pokemon = ["Water", "Fire"]
data = data[data['Type 1'].isin(filter_pokemon)]
data = data.reset_index()
data = data.drop(['Type 2', 'Total','Generation','Legendary', "#", "index"], axis=1)
# Preview data
data.head()
Name | Type 1 | HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | |
---|---|---|---|---|---|---|---|---|
0 | Charmander | Fire | 39 | 52 | 43 | 60 | 50 | 65 |
1 | Charmeleon | Fire | 58 | 64 | 58 | 80 | 65 | 80 |
2 | Charizard | Fire | 78 | 84 | 78 | 109 | 85 | 100 |
3 | CharizardMega Charizard X | Fire | 78 | 130 | 111 | 130 | 85 | 100 |
4 | CharizardMega Charizard Y | Fire | 78 | 104 | 78 | 159 | 115 | 100 |
X = data[data.columns[2:]]
y = data['Type 1']
X.head()
HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | |
---|---|---|---|---|---|---|
0 | 39 | 52 | 43 | 60 | 50 | 65 |
1 | 58 | 64 | 58 | 80 | 65 | 80 |
2 | 78 | 84 | 78 | 109 | 85 | 100 |
3 | 78 | 130 | 111 | 130 | 85 | 100 |
4 | 78 | 104 | 78 | 159 | 115 | 100 |
y.head()
0 Fire 1 Fire 2 Fire 3 Fire 4 Fire Name: Type 1, dtype: object
# Split data
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
🛂 What is the shape of X_train, X_test, y_train, y_test
?
print(X.shape)
print(X_train.shape)
print(X_test.shape)
(164, 6) (131, 6) (33, 6)
print(y.shape)
print(y_train.shape)
print(y_test.shape)
(164,) (131,) (33,)
# Model
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()
logreg.fit(X_train,y_train)
/anaconda3/envs/ml/lib/python3.6/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning. FutureWarning)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True, intercept_scaling=1, max_iter=100, multi_class='warn', n_jobs=None, penalty='l2', random_state=None, solver='warn', tol=0.0001, verbose=0, warm_start=False)
y_pred = logreg.predict(X_test)
from sklearn import metrics
metrics.accuracy_score(y_test, y_pred)
0.7878787878787878
Charmeleon
data[data['Name']=="Charmeleon"][data.columns[2:]]
HP | Attack | Defense | Sp. Atk | Sp. Def | Speed | |
---|---|---|---|---|---|---|
1 | 58 | 64 | 58 | 80 | 65 | 80 |
# Predict Charmeleon
logreg.predict(data[data['Name']=="Charmeleon"][data.columns[2:]])
array(['Water'], dtype=object)
Wartortle
# Predict Wartortle
logreg.predict(data[data['Name']=="Wartortle"][data.columns[2:]])
array(['Water'], dtype=object)
from sklearn import metrics
cnf_matrix = metrics.confusion_matrix(y_test, y_pred)
cnf_matrix
array([[ 3, 5], [ 2, 23]])
data['Type 1'].value_counts()
Water 112 Fire 52 Name: Type 1, dtype: int64