Detecting Fake News Using Deep Learning
Fake news has been spreading in greater numbers and has generated more and more misinformation which has resulted in riots, fights and also deaths of innocent people. To resolve this major issue it becomes necessary to use some computational tools which can detect them like ‘CountVectorizer’, ‘TFid Vectorizer’ and many more which can help us to decrease the spread or budding of hoaxes.
With the emergence of the technological revolution, and with it technologies such as radio , internet and television, information medias such as written newspapers were pushed aside and the world opened up to new ways to learn about events with much of the news coming from the new sources.
Innocent looking social media posts to absolutely renowned web pages have fallen in the trap of spreading false news.
Project Deployment Strategies
Dataset : The dataset which was been used for this python project has the dimensions 7764 x 4
The first column identifies the news. The second column is title . The third column is the text .The fourth column has labels denoting whether the news is real or fake.
Libraries Used : Numpy, Pandas , Sklearn
Text data requires special preprocessing to implement machine learning or deep learning algorithms on them. There are various techniques widely used to convert data it into a form that is ready for modeling.
The data preprocessing steps are outlined below which are applied to both headlines & the news articles.
Stop Word Removal
We start with removing stop words from the text data available. Stop words are common words in a language which do not provide much content.
Punctuation Removal -:
Punctuation in natural language provides the grammatical context to the sentence. Punctuations such as comma, might not add much value in understanding the meaning of the sentence.
Stemming is a technique to remove Prefixes and Sufixes from a word, ending up with the stem. Using stemming we can reduce inflectional forms and sometimes derivationally related forms of a word to a common base form.
TF-IDF vectorizer -:
The technique “Term Frequency — Inverse Document Frequency “(TF-IDF) can be used for feature extraction Term frequency and Inverse Document frequency are two components of TF — IDF, Term Frequency identifies the local importance of a word by the occurrence in the document. IDF Identifies the signature words which are not appeared more often with the document.
#Initialize a PassiveAggressiveClassifier
#DataFlair — Predict on the test set and calculate accuracy #PriteshTambe
For model deployment we have used Passive Aggresive classifier. We’ll fit this on tidf_train and y_train. Then we will predict the test set from the TridVectorizer and calculate the accuracy with accuracy_score () from sklearn metrics.
Passive Aggressive algorithms are online learning algorithms. Such an algorithm remains passive for a correct classification outcome, and turns aggressive in the event of miscalculation, updating and adjusting.
Unlike most algorithms it does not converge. Its purpose is to make updates that covers the loss.
We got an accuracy of 92.82% with this model. Finally we also printed the confusion matrix to gain insight into the number of true positives and false negatives
The confusion Matrix gave us 589 TP, 587 TN ,42 FP, 49 FN
1.) High accuracy.
2.) People can now distinguish between Humous and right news This cam minimize the spread of fake news.
3.) This can minimize the spread of fake news.
4.) This will expose fake news sources.
Fake news can be detected with the help or Python. By using a political dataset we aquired 92% accuracy.