Summary: Slang, including internet slang and slang used by younger generations, constantly evolves. Our second article in this ten-part series brought to you by our Data Science and Library teams describes how Social Sentinel constantly evolves its solution to ensure our Social Media Scanning product keeps up with rapid shifts in popular language.
The Alert Identification 10-part blog series provides insight into the inner-workings of our social media scanning product. Brought to you by our Data Science team.
The science of Linguistics is much more complicated than meets the eye. It is the study of how people actually communicate, rather than how people should communicate. For instance, we can see that the “correct” grammar, spelling, and punctuation conventions used in an academic essay do not necessarily apply on social media. One specific area of ever-evolving linguistic change, primarily in younger demographics -- and especially on social media -- is slang.
How slang spreads
In years past, it took terms and phrases much longer to spread globally from their source and into mainstream culture because we were much less connected, mainly from a digital perspective. If we take a look at terms from the world of music (as seen in the timeline below), we can chart parallels between improvements in technology and the adoption of these terms as part of our vernacular.
The word jazz, for example, took four years to move from the U.S. to other parts of the world in the early twentieth century. In contrast, more recent slang terms could spread worldwide within days or even hours; a change that can be solely attributed to technology.
In today’s world of ubiquitous global connectivity, slang can propagate nearly instantaneously. Emerging slang terms form in ways similar to how our language has evolved over thousands of years. Linguists can categorize slang origins in a number of ways.
- Coinage: emergent slang words that are either not derived directly from current language or alter the previous meaning of a word
Examples: Bye Felicia, ratchet, slay, twerk
- Semantic Derivation: slang words derived from common words with meanings that are inspired by the original definition
Examples: extra, swerve, basic, salty
- Acronym: letters that represent words in a phrase
Examples: lol, imu, tfw, goat
- Compounding: a word composed of two separate words
Examples: f*ckboy, traphouse, greaseball, cyberbully
- Blending: the fusion of two or more words
Examples: frenemy, bromance, phubbing, hangry
- Truncation: a shortened version of a word or phrase
Examples: sav, sup, finna, sus
Slang in our social media scanning product
Social Sentinel’s analysts are responsible for utilizing both their linguistic and research expertise to continually enhance our slang libraries. There are two driving forces that motivate us to keep cognizant of ever-changing slang.
1) Reducing false positives
Many times, being familiar with the slang in written text can change the interpretation. For example,
"I'm gonna kill him el oh elz"
At first glance, this sentence might seem harmful. However, knowing that el oh elz can translate to lol or laughing out loud changes the presumed meaning. Our system may still consider the post potentially harmful (you can read about the Complexity of Harmful Language in our previous blog post), but translating the included slang is an important factor. For this reason, we must continuously update our lexicon of slang terms that indicate sarcasm or humor.
Additionally, understanding slang helps our system differentiate between threatening and non-threatening usages of the same word. One example of a term that is potentially harmful but also has many non-threatening slang meanings is bomb.
For this single word, Webster’s Dictionary lists 15 distinct definitions. Urban Dictionary lists 160! Our machine learning models must take into account all of the lexical forms that imply bomb is being used in a non-threatening context. Understanding these linguistic patterns helps our system confidently determine the text is not harmful.
Since words like bomb and shoot can be used in a threatening context, we cannot simply ignore these potential signals. Instead, we must research the lexical patterns associated with each usage, and diligently keep these forms updated as usage evolves.
2) Identifying harmful content
Sometimes, the slang term or phrase itself is the concerning element within a sentence. Our Sentinel Search™ Library is enormous and ever-evolving, but these excerpts can provide a glimpse into the categories of harmful slang we make sure to understand.
- Related to Eating Disorders
Ana - Slang for Anorexia, sometimes discussed as if the disease were a person. For example, ‘My friend Ana.’
Mia - Similar to above but representing Bulimia
Ed or Ednos - Similar to above but representing any general eating disorder.
- Related to Self-Harm
Sue or Sui - Slang for suicide. Similar to Ana and Mia, Sue can be discussed in the same way as a person.
Kms or Kys - Acronyms for kill myself or kill yourself. Many times these acronyms are used in a hyperbolic or sarcastic sense; however, sometimes the phrase is used literally. This is where context becomes important (a future article will describe analyzing context in more detail).
Cutting, styro, tw, blue whale - Terms related to self-mutilation that could indicate a serious issue.
- Related to Violence Against Others
Smoke, waste, murk, clip, pop the trunk - When used as transitive verbs, or action words directed at a person or object, these words can be threatening. For example, ‘I will smoke you.’
Chopper, ak, hammer, gat, cuete, 22, 45 - These are all nouns that could represent a weapon. For example, ‘I have a 45 in school.’
- Related to Dangerous Subcultures
Shooter fandom - Some mass shooters are inspired by previous events and publish idolizing content that could signal future danger. This fandom language can be blatant, but other times, it can use slang that is more inconspicuous. For example, a post praising Reb, VoDkA, Arlene, or natural selection could be alarming Columbine fandom.
Extremism - Hate speech can precede violent events. A body of text that includes phrases like Great Replacement, Might is Right, boot party, pepe, or 14/88 might be considered harmful.
Incel - Incels comprise an online subcommunity of involuntary celibates. This community has been associated with numerous mass murders, beginning with the Isla Vista Shooter in 2014. They have an entire vocabulary that contains many terms that could be used in a threatening way.
Keeping abreast of the evolving language used online and by the younger generation is a difficult task. It involves constant research as well as expertise and knowledge of linguistic structure. Social Sentinel’s Data Science team can do this research for you to elevate only the potentially actionable content.
Next in our series: Clarifying Context.
With the help of technology, slang terms have the potential to spread further and faster than ever before.
Staying ahead of emerging slang trends requires deft research and linguistic expertise.
Teaching machines to understand the multiple meanings of a single slang term (like bomb) is a substantial challenge.
Decoding harmful slang content can unlock dangers hiding in plain sight.