Text Classification with Sentiment Analysis using BERT

Sentiment analysis, the art of understanding and extracting sentiments or emotions from text, plays a crucial role in Natural Language Processing (NLP). In this blog post, we'll explore the process of training a sentiment analysis model that classifies text into two categories: positive and negative sentiments. We'll be using the powerful BERT model, a pre-trained transformer-based model, to perform this classification task.


Word Cloud Visualization

Before we dive into building our sentiment analysis model, let's explore the most frequently occurring words in our dataset. These words often provide valuable insights into the topics and sentiments expressed by our reviewers.

In the dataset, which primarily consists of German language reviews from a public transport navigation app, we commonly encounter words like 'Bahn,' 'Deutsche Bahn,' 'Verspätung,' and more. To better understand the sentiments and experiences of the users, one can also generate word clouds for each sentiment class to visualize which words are most associated with positive and negative sentiments in your dataset (not included here).

Now, let's visualize these high-frequency words for both classes using a word cloud:

You can find the code used to visualise the word cloud here.

Introduction to BERT

BERT, short for Bidirectional Encoder Representations from Transformers, is a state-of-the-art NLP model developed by Google. It's pre-trained on vast amounts of text data and has proven to be highly effective for a wide range of NLP tasks, including sentiment analysis.

You can find the pre-trained BERT model we'll be using on the Hugging Face Model Hub here.

Data preparation

Before diving into model building, we need to prepare and clean our dataset. In this project, we're using a dataset containing feedback data. The data preparation steps include lowercasing, removing special characters, URLs, handling missing values, and balancing the dataset so it includes roughly equal number of positive and negative reviews in our training data set. The last step is required in order to avoid the model being biased during the training process. The data preparation part can be fully reproduced by using the following script.

Tokenization and Dataset Loading

BERT requires text data to be tokenized into manageable chunks. We'll use the transformers library to tokenize our text data and create train and test datasets (see the tokenization step here).

Model Building and Training

Now comes the exciting part – building and training our sentiment analysis model using BERT! We'll use the transformers library to load the pre-trained BERT model and fine-tune it for our specific task. Feel free to reproduce the model training step, either with the data provided here or with your own data, as well as fine-tune the hyper parameters in order to reach the desired model classification accuracy.

Model Evaluation and Metrics

To evaluate the performance of our sentiment analysis model, we rely on a valuable tool known as the confusion matrix, which provides a detailed view of the model's classification results (see the figure below).

In this matrix, we have two main classes: "Negative" and "Positive." On the diagonal, we see the values that indicate correct classifications, while off-diagonal values represent misclassifications. Let's look into the details of each part:

  • True Negatives (TN): These are the cases where the model correctly identified text with a negative sentiment. In our matrix, 75% of the initial negative text is classified as negative, indicating that the model accurately captured negative sentiment.

  • False Positives (FP): These are the cases where the model incorrectly classified text as positive when it was actually negative. In our matrix, 25% of the initial negative text was misclassified as positive. This is an important metric, as it indicates the model's tendency to sometimes overlook negative sentiment.

  • True Positives (TP): These are the cases where the model correctly identified text with a positive sentiment. In our matrix, 85% of the initial positive text is classified as positive, showcasing the model's strong ability to capture positive sentiment.

  • False Negatives (FN): These are the cases where the model incorrectly classified text as negative when it was actually positive. While our matrix doesn't explicitly show this value, it's essential to keep in mind that minimizing false negatives is crucial for sentiment analysis. Incomplete recognition of positive sentiment can lead to missing valuable insights.

Evaluating the model's performance goes beyond these individual metrics. We can derive various other useful statistics such as accuracy, precision, recall, F1-score and the area under the ROC curve (AUC-ROC). These metrics provide a holistic view of how well our model is performing, however they are currently omitted from this post. The approach for calculating and visualising the confusion matrix can be fully reproduced by following documentation provided here.

Conclusion

In this post, we've explored the process of building a sentiment analysis model using BERT, from data preparation to model training, evaluation and results visualisation. Sentiment analysis is a powerful tool with applications in social media monitoring, customer feedback analysis, and much more.

By following the steps outlined in this post, you can easily apply sentiment analysis to your own text data and gain valuable insights. Feel free to check out the Hugging Face Model Hub for more pre-trained models and resources.

I hope you found this blog post informative and helpful for your NLP projects. Stay tuned for more exciting NLP adventures!