Project 7 - Car Prices Prediction using Machine Learning with Python

Project 7 - Car Prices Prediction using Machine Learning with Python

My Machine Learning Beginner Projects, Entry 7

I am Salim Olanrewaju Oyinlola. I identify as a Machine Learning and Artificial Intelligence enthusiast who is quite fond of making use of data and patterns to develop insights and analysis.

In my opinion, Machine learning is where the computational and algorithmic skills of data science meets the statistical thinking of data science. The result is a collection of approaches that requires effective theory as much as effective computation. There are a plethora of machine learning model, with each of them working best for different problems. As such, I believe understanding the problem setting in machine learning is essential to using these tools effectively. Now, the best way to UNDERSTAND different problem settings is by PLAYING AROUND with different problem settings. That is the genesis behind this writing series - My Machine Learning Projects. Over the course of this writing series, I would solve a machine learning problem daily. These problems will range from a plethora of fields whilst requiring and covering a range of models. A link to my previous articles can be found here.

Project Description: This project involves the prediction of the price of buying a car based on certain attributes.

URL to Dataset: Download here

Line-by-line explanation of Code

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.linear_model import Lasso
from sklearn import metrics

The block of codes above imports the third party libraries used in the model.

import pandas as pd imports the pandas library which is used to analyze data.

import matplotlib.pyplot as plt imports the pyplot function from the matplotlib library which is used to visualize data and trends in the data.

import seaborn as sns imports the seaborn library which is used for making statistical graphics. It builds on top of matplotlib and integrates closely with pandas data structures. Seaborn helps explore and understand your data.

from sklearn.model_selection import train_test_split imports the train_test_split function from sklearn's model_selection library. It will be used in spliting arrays or matrices into random train and test subsets.

from sklearn.linear_model import Lasso imports the Lasso linear regresssion machine learning model from sklearn's linear_model library.

from sklearn import metrics imports the metrics library from the sklearn library. This model is used to ascertain the performance of our model.

salim_car_dataset = pd.read_csv(r'C:\Users\OYINLOLA SALIM O\Downloads\car data.csv')

This line of code loads the data from csv file to a pandas dataframe named salim_car_dataset.

salim_car_dataset.head()

This line of code displays the first 5 rows of the dataframe.

image.png

salim_car_dataset.shape

This line of code checks the number of rows and columns. The observed output is (301, 9).

salim_car_dataset.isnull().sum()

This line of code checks the number of missing values. It is seen that there is no missing value in the dataset.

print(salim_car_dataset.Fuel_Type.value_counts())
print(salim_car_dataset.Seller_Type.value_counts())
print(salim_car_dataset.Transmission.value_counts())

This block of code checks the distribution of categorical data in the Fuel_Type, Seller_Type and Transmission columns of the dataset.

image.png

In the Fuel_Type column, we see: Petrol 239 Diesel 60 CNG 2

In the Seller_Type column, we see: Dealer 195 Individual 106

In the Transmission column, we see: Manual 261 Automatic 40

salim_car_dataset.replace({'Fuel_Type':{'Petrol':0,'Diesel':1,'CNG':2}},inplace=True)

This line of code encodes the Fuel_Type Column.

salim_car_dataset.replace({'Seller_Type':{'Dealer':0,'Individual':1}},inplace=True)

This line of code encodes the Seller_Type Column.

salim_car_dataset.replace({'Transmission':{'Manual':0,'Automatic':1}},inplace=True)

This line of code encodes the Transmission Column.

salim_car_dataset.head()

This line of code prints the first five rows of the dataset to show the label encoded dataset.

image.png

X = salim_car_dataset.drop(['Car_Name','Selling_Price'],axis=1)
Y = salim_car_dataset['Selling_Price']

This block of code separates the data and Label i.e. into X and Y.

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.1, random_state=2)

The train_test_split method is hence called and used to divide the dataset into train set and test set.

lass_reg_model = Lasso()

This line of code loads the lasso regression model by creating an instance.

lass_reg_model.fit(X_train,Y_train)

This line of code trains the model with the train dataset. (i.e. X_train and Y_train)

training_data_prediction = lass_reg_model.predict(X_train)

This line of code predicts on the Training data.

error_score = metrics.r2_score(Y_train, training_data_prediction)
print("R squared Error : ", error_score)

This block of codes displays the R squared Error. The R squared Error is given as 0.8427856123435794.

plt.scatter(Y_train, training_data_prediction)
plt.xlabel("Actual Price")
plt.ylabel("Predicted Price")
plt.title(" Actual Prices vs Predicted Prices")
plt.show()

This block of codes helps in visualizing the actual prices and Predicted prices.

image.png

test_data_prediction = lass_reg_model.predict(X_test)

This line of code predicts on the test data.

error_score = metrics.r2_score(Y_test, test_data_prediction)
print("R squared Error : ", error_score)

This block of codes displays the R squared Error. The R squared Error is given as 0.8709167941173195.

plt.scatter(Y_test, test_data_prediction)
plt.xlabel("Actual Price")
plt.ylabel("Predicted Price")
plt.title(" Actual Prices vs Predicted Prices")
plt.show()

This block of codes helps in visualizing the actual prices and Predicted prices.

image.png

That's it for this project. Be sure to like, share and keep the discussion going in the comment section. .ipynb file containing the full code can be found here.