Summary: A vital aspect in understanding language is identifying context.Our Social Media Scanning product is built with context in its AI-based mind. Social media posts can range from simple and innocuous, to mysterious and dangerous, and understanding the intentions behind them begins with contextualization.
The Alert Identification 10-part blog series provides insight into the inner-workings of our social media scanning product. Brought to you by our Data Science team.
Without realizing it, the human brain is incredibly adept at analyzing context. We carry a great deal of knowledge about pop culture, sports, irony, and humor then apply the contextual information to determine the meaning of a sentence.
Consider these harmless sentences, given their context:
Kill the whole squad for #ApexChampion
My boyfriend is going to shoot me outside of school 📸😃
I swear Antonio Brown is tryna kill me
I loved it when Harry shot up all the horcruxes
I am so terrified right now, someone help! I hate spiders.
A lot can be conveyed with a small number of characters. Recognizable names, emojis, or hashtags add a vast amount of contextual meaning. Every day, our brains collect new signals related to current events and popular culture.
Social Sentinel’s AI requires visibility into these signals, as well. Our Data Science team continually seeds our machine learning models with information to decipher as many contextual signals as possible.
Let’s explore some of the hurdles of using machine learning and natural language processing to clarify the context of popular culture.
Some movies contain violent plotlines. When discussed on social media, authors could rely on violent or threatening terms. Since authors are merely discussing for entertainment value, these posts are not categorized as threatening posts.
The following example posts contain potentially harmful or threatening language. However, other signals indicate the authors are simply discussing movies.
Maybe Natasha found out the only way to finish Thanos is shooting him in the face
Zac Efron as Ted Bundy... I'd be ok with being murdered by him
She didn’t have to kill him! He has a pregnant wife #Runaways
Our Linguistic Analysts research all upcoming movie releases to arm our system with the ability to identify such signals.
TV shows require even more research because they’re typically released episodically over time rather than a single, complete film. Some popular shows like 13 Reasons Why, How to Get Away With Murder, and American Horror Story contain content that could potentially trigger harm language filters.
However, few shows generated the volume of social media chatter with highly alarming verbiage as Game of Thrones. Its scale and complexity presented a unique challenge since it focused on 45 primary characters across eight seasons.
I know it’s going to be a bloodbath on #GOT! Who do you think will die!?
You raped her. You murdered her. You killed her children! #DemThrones
Why is Tyrion alive?! I would have killed him right off the bat. Blood for blood.
We identified 15 associated contextual signals for the character Daenerys Targaryen alone from her titles and various misspellings of her name.
Diligent research ensured our system never missed the same signal twice. After each episode, our Linguistic Analysts combed through processed posts to uncover previously unknown contextual signals.
Gamers can act out violently in virtual worlds then boast about their accomplishments. These posts are usually not harmful or threatening. But that’s not always clear until we properly account for the contextual signals.
Fighting for the third kill. Modern Warfare FTW.
You kill my team, I come back for you. And I kill you… dead #COD
#ApexLegends: 5 shots, 3 kills in less than 30 seconds
Similar to movies, when a major release approaches, our Linguistic Analysts prime our machine learning models with curated signals that help indicate when authors discuss video games.
Some authors hide behind social media’s semi-anonymous structure to direct negative comments.
We see most of this heated language pointed toward celebrities and politicians involved with cultural movements (#MeToo, for example). These posts are not typically specific threats. Instead, they are emotionally-charged, comments from authors who have a strong opinion about a public figure’s situation.
The content often contains potentially harmful language, which could result in an actionable Alert if our statistical models weren’t prepared.
Sports events could generate false positives because many slang terms have alternate, harmful meanings (i.e., bomb, shoot). Additionally, some authors use intense language when discussing their fantasy sports teams and players.
Tyree Jackson is about to start shooting up boards here, people!
Somebody call the police... because I just MURDERED my #fantasy draft
We're gonna get raped by Bama today
To reduce false positives, our Data Science team ensures our models are prepared with various signals related to teams, players, and mascots. They also strengthen the AI to recognize general sports patterns like scores, stats, or positions.
Emojis possess a great deal of context and could completely change the meaning of a sentence. Translating them is rarely a straightforward task.
My boyfriend is going to kill me for this one 😆😂
🏀Haven’t been shooting in a while, tonight is the night 🏀
Need help - about to have a mental breakdown 🤣😜🤣
Identifying non-literal language - like irony and sarcasm - is equally complex.
- Irony usually states the opposite of what is intended for effect.
- Sarcasm, while employing irony, is insincere speech with derogatory language.
Since both are prevalent on social media, you could see:
This life thing is killing me /s lol
Someone called me sir and now I want to kill myself
My sister has a stomach flu so I’m gonna kms
Ironic or sarcastic language may seem harmless, but it may also be indicative of more serious social-emotional behaviors. Our goal is to identify alert-worthy content for you, so we err on the side of caution when delivering actionable Alerts.
Solving the Problem
Content included in a social media post can range from innocent to legitimately dangerous. Understanding the author’s intent begins with understanding context. The examples above highlight just some of the contextual signals our Data Science team utilizes to seed our machine learning models. Sometimes the signals are straightforward. Sometimes they require more complicated logic to identify the inconsequential content — without mistakenly missing a legitimate actionable Alert.
Between preparing our system before known events, and continually analyzing incoming content for patterns, we remove a vast majority of noise from harmless posts. However, since we heavily weigh the risk of missing a valid actionable Alert, you may see posts you later determine to be not harmful.
Coming up soon: False Positives.
A vital and challenging aspect of understanding language is identifying the context.
Discussion of popular culture over social media is often filled with language that could be misconstrued as potentially harmful.
Our Data Science team continually seeds our machine learning models with information to decipher as many contextual signals as possible.