Let’s face it, people are spending more time on social media than ever before. In 2018, around 2.65 billion people were using social media worldwide, a number thought to increase to 3.1 billion by 2021.

And you’re technically on social media right now reading this article 😉

Mosts posts are harmless depictions of life, like status updates, pictures of friends or food, the occasional meme, etc. etc.

We give these posts a like, maybe a comment, then keep on scrolling.

But sometimes people post about more personal topics and show signs that things in their life are not going well. These posts may consist of words that convey loneliness, hopelessness, irritability, or hostility.

Showing behaviors on social media that are out of character, posting about trouble sleeping or eating, or withdrawing from everyday activities can also be signs that a person is struggling emotionally. They could even be thinking about suicide or self-harm.

“Often, people who intend to harm themselves will make an explicit statement to their social media circle or to a specific individual in a text — they say they want to die or kill themselves, talk about shooting or cutting themselves, or wonder if things would be better if they didn’t wake up. Other people may send a goodbye message that sounds permanent.” — Dr. Schabbing, director of Psychiatric Emergency Services for OhioHealth.

So, what do you do if a friend expresses warning signs of depression or suicide on social media?

Platforms like Twitter and Facebook have forms that you can submit when reporting suicidal ideation, and they encourage you to reach out to your friend or someone else for help.

But what if the message was posted a couple of hours ago? It seems like no one has viewed the post, and time has gone by. Has your friend harmed themselves? Is it 2 hours too late?

This is a situation that is time-sensitive, meaning every minute counts. However, intervention isn’t possible until the post is, well, viewed and reported.

Manually reported, that is…

What if we could make it so posts like this are automatically reported? In this way, intervention could happen before it’s too late.

And this is where natural language processing is key 🔑

Natural Language Processing & LSTMs

Natural language processing, or NLP for short, is a type of artificial intelligence that analyzes large amounts of human language data, rather than the typical numerical data. It can be thought of as the intersection between computer science and linguistics.

With NLP, we can make predictions about human language data, like if the text conveys happiness or sadness, using these things called recurrent neural networks (RNNs).

One type of RNN that’s great at making predictions with human language data is the Long-Short Term Memory (LSTM) neural network because it’s really good at remembering things, like words in a sentence, for a long period of time.

LSTM architecture

LSTMs take in 3 inputs: long term memory, short term memory, and a current event.

The inputs go through a series of gates to produce a new long term memory, short term memory, and prediction.

There are 4 gates in our LSTM: the forget gate, learn gate, use gate, and remember gate.

Each of these gates can be explained in simple words along with some math.

Forget Gate

The long term memory first enters the forget gate and everything irrelevant that we don’t care about is forgotten.

In math terms, the longterm memory is multiplied by the forget factor, f:

Equation for the forget factor, in which W is the weight, STM is the short term memory, E is the current event, b is the bias term, and σ is the sigmoid function

Once computed, it is multiplied by the longterm memory in the forget gate.

Learn Gate

The learn gate combines the current event and short term memory to learn new information from the current event. Irrelevant information is ignored.

Equation for N, in which W is the weight, STM is the short term memory, E is the current event, b is the bias term, and tanh is the hyperbolic tangent function

Once computed, this gives us N. We then multiply by i, the ignore factor:

Equation for the ignore factor, in which W is the weight, STM is the short term memory, E is the current event, b is the bias term, and σ is the sigmoid function

After multiplying N and i, our LSTM has learned the data in the current event, E.

Remember Gate

In the remember gate, the results from the learn and forget gates are simply added to each other.

Use Gate

The use gate, or output gate, takes the useful information from the long term memory and short term to produce the output, which is also used as the new short term memory.

In math terms, the output is V multiplied by U:

Equation for U, in which W is the weight, LTM is the long term memory, f is the forget factor, b is the bias term, and tanh is the hyperbolic tangent function

Equation for V, in which W is the weight, STM is the short term memory, E is the current event, b is the bias term, and σ is the sigmoid function

And we have finally finished following how our inputs ➡️ outputs.

While LSTMs definitely aren’t intuitive, the applications of them are virtually endless. In my project, I used a technique called sentiment analysis.

Sentiment Analysis to Identify At-Risk Users

Sentiment analysis is simply the classification of emotions based on text. While it’s more commonly used to classify things like reviews of products for companies, I used it to identify depressed and suicidal users on social media to possibly prevent self harm or suicide.

The recurrent neural network (RNN) architecture I used

The basic architecture I used can be seen above. The first layer is an embedding layer. An embedding layer is necessary due to the vast amount of words in the vocabulary of the data. The embedding layer can be used as a “lookup table” for the words — its reducing the amount of dimensions in the data.

The next layer is the LSTM which gathers information about the sequence of the words. I actually used 2 layers because most of the time, adding more layers allows the network to learn more complex relationships.

Finally, the LSTM outputs go to a sigmoid output layer. The sigmoid function squishes the LSTM output between 0 and 1, in which 1 = at-risk and 0 = safe. We only care about the very last sigmoid output to predict whether the the user is at-risk. If the last sigmoid output is < 0.5, the user is predicted as safe, whereas if it is > 0.5, the user is predicted to be at risk.

To obtain the data, I web scraped Twitter for Tweets that showed signs of depression or suicide based on the criteria outlined the in the DSM V:

Normal Tweet

Random Tweet

At-Risk Tweet

Tweet showing symptoms of depression

After web scraping, I first had to preprocess the Tweets. The preprocessing steps included:

  • Get rid of punctuation and make all Tweets lowercase
  • Stemming all of the words, or removing the suffixes and affixes of words (for example, “flying” → “fly”)
  • Tokenize all of the words in the vocabulary, in which I created a dictionary where each word is a number
  • Pad or truncate all reviews to a specific length to deal with very short or long tweets

I then defined the RNN using PyTorch:

import torch.nn as nnclass SentimentRNN(nn.Module):
    """
    defines the architecture of the RNN
    """def __init__(self, vocab_size, output_size, embedding_dim, hidden_dim, n_layers, drop_prob=0.5):
        """
        initializes the model by creating the layers
        """
        super(SentimentRNN, self).__init__()self.output_size = output_size
        self.n_layers = n_layers
        self.hidden_dim = hidden_dim
        
        #creates the embedding and LSTM layers
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.lstm = nn.LSTM(embedding_dim, hidden_dim, n_layers, 
                            dropout=drop_prob, batch_first=True)
        
        #creates the dropout layer
        self.dropout = nn.Dropout(0.3)
        
        #creates the linear and sigmoid layers
        self.fc = nn.Linear(hidden_dim, output_size)
        self.sig = nn.Sigmoid()def forward(self, x, hidden):
        """
        performs a forward pass of the model on an input and hidden state
        """
        batch_size = x.size(0)#embeddings and lstm_out
        x = x.long()
        embeds = self.embedding(x)
        lstm_out, hidden = self.lstm(embeds, hidden)
    
        #stacks up lstm outputs
        lstm_out = lstm_out.contiguous().view(-1, self.hidden_dim)
        
        #initializes dropout and fully-connected layer
        out = self.dropout(lstm_out)
        out = self.fc(out)
        #initializes the sigmoid function
        sig_out = self.sig(out)
        
        #reshapes to be batch_size first
        sig_out = sig_out.view(batch_size, -1)
        sig_out = sig_out[:, -1] # get last batch of labels
        
        #returns last sigmoid output and the hidden state
        return sig_out, hidden
    
    
    def init_hidden(self, batch_size):
        ''' initializes hidden state '''
        #creates two new tensors with sizes n_layers x batch_size x hidden_dim,
        #initializes to zero, for hidden state and cell state of LSTM
        weight = next(self.parameters()).data
        
        if (train_on_gpu):
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda(),
                  weight.new(self.n_layers, batch_size, self.hidden_dim).zero_().cuda())
        else:
            hidden = (weight.new(self.n_layers, batch_size, self.hidden_dim).zero_(),
                      weight.new(self.n_layers, batch_size, self.hidden_dim).zero_())
        
        return hidden

And after lots of errors and epochs, I was able to get the RNN to 73% accuracy, meaning it could accurately predict at-risk users 73% of the time!

However, many questions still await about how something like this could be implemented.

Future and Questions

One of the biggest questions is, what would actually happen if a user was identified by the neural network as at-risk? Would 911 be called?

The short answer is no. Although it depends on the severity and confidence of the neural network, the user could be given resources by the social media provider to get help, rather than 911 being called. A professional employee could then review the post identified as at-risk and determine if further action is needed.

User privacy is also a concern that would need to be addressed — where is the line between compromising user privacy, but potentially saving a life? Important questions like this would need to be fleshed out before implementation, but hopefully one day, technology like this would be able to provide people with help who need it the most.


Heyo! 👋 I’m Mikey, a student who’s super passionate about the intersections between exponential technologies, neuroscience, and mental health. My goal is to revolutionize diagnosis and treatment of neurological and psychiatric disorders. However, I’m pretty much interested in everything. Shoot me an email at mike.s.taylor101@gmail.com if you would like to further discuss this article, or just chat!

Related Posts

Privacy Preference Center