Kenyans have regional dialects − even on Twitter

With the rapid spread of social media in the last two decades the world has never felt smaller. Many people have referred to the internet as the global village.

Kenyans born after 1999 have lived in a world where there is widespread adoption of digital technology. This generation of young people has grown up with cellular communication, internet connectivity, and smartphones.

As the digital natives turn 18 years old this year and join social media, they will lend their lingo to conversations.

But even though many Kenyans are now part of one huge web community with a shared idiolect, they still maintain traits attributable to their locality.

The use of social media, mostly through short-form messages, has changed the way we communicate across different ages.

As the digital natives turn 18 years old this year and join social media, they will lend their lingo to conversations. Of particular interest in this post is their use of Twitter and how different dialects are developing from its usage.

A collection of 5.3 million tweets gathered over a period of three months (non-consecutive) from 2,804 users in 72 towns and cities in Kenya created the basis for this study. Sampling followed the locations indicated by users on their profiles when they contributed to a trending topic. The resultant dataset may contain implicit bias – nevertheless, the number of tweets under study followed population distribution in the country. Hence, the top 10 towns in number of tweets were Nairobi, Mombasa, Nakuru, Eldoret, Kisumu, Kiambu, Thika, Kakamega, Kisii and Meru.

New world, new words

A dialect is a variety of a language – it may be a result of regional differences in a country, social differences or even occupations. Indicators of a dialect consist of variations in spelling, derivatives of a word, and in some scenarios use of alternate words. Let's get started with the new words.

Before commencing, our approach will be probabilistic – to characterise how the word frequencies change with geography and age. A word with a strong geographical association implies that the probability of someone from the region using the word is higher than other areas. The probability controls for towns with more users and more tweets.

The first word out of the 'probability machine' is ital. A total of 379 tweets contained this term from 63 users. A regional probability score shows Runyenjes and Machakos having the highest usage of the word followed by Kinoo, Kiambu, and Ongata Rongai. The median score for the age on Twitter is one year – this means the term is almost exclusively used by people who are one year or less on Twitter and emanate from the areas mentioned above.

The word ital as used in the tweets means okay or perfect. The origin of the word is in the Rastafari movement, where it means food following the Rastafarian teachings – synonymous with kosher for Jews and halal for Muslims. The Rastafarians derived the word from the English term vital, with the v removed to signify unity with nature. In Kenya, it transcends food, and it's now a general term among young people meaning something great. Example tweet below:

"@CharraDeejay @_shideh Matasia locked selection ital kichwa kichwa #JAHMROCKDOBA"

The above tweet contains the word ital, but there's another word next to it - kichwa kichwa. This word appears sparingly among other tweets; it means bumping one's head to the music. Therefore, the above tweet translates to: I am tuning in from Matasia, the music selection is great, and I am bumping my head to it. The word ital is common among Ghetto Radio listeners, and perhaps the probability score is proxy for the regional popularity of that station.

Pretty please

It is pleasant to be polite. A word that induces politeness in a sentence is, please. Numerous variations of the word such as pliz, plz, pls make up part of most short-hand communication. Among the many abbreviations of the word please, a new variation is pse. In the dataset obtained, use of the word only appears in tweets from Eldoret, Kapenguria and Nairobi. The probability score is highest in Kapenguria, followed by Nairobi, and finally Eldoret. Excluding Nairobi (which is a cosmopolitan city), the word pse forms a part of a regional dialect - the westernmost region of Kenya. Example: "RT @edwarmi: @aina_akin hi Tade, pse help us to get Willy's piece to your pan-African networks! Tks: https://t.co/Kv0vPj4kmS”

Outside Kenya, the abbreviation pse has a higher usage in Uganda. Given Kapenguria's closeness to the Uganda border, it is plausible the use of pse in town is an influence from relations and friends across the border. Unlike the word ital, the median age in use of the word pse is seven years. The 'second cohort' on Twitter came online between the years 2009 and 2011 (about seven years ago). The devising of the word may have occurred during this period and thus forms a sociolect (social dialect), a variation in language use due to social grouping.

A younger crowd on Twitter prefers the use of plz in being courteous. The median age for plz is five years. Perhaps an exciting variation of the word please is pris – it takes a humorous meaning by adding sarcasm to the word plis. The newly minted word is common among older Twitter users (median age on Twitter is eight years). Example:"Tugawane pension pris RT @dee_tosha: Time to retire. Goodnight guys."

A map of dialects

In the dataset, we seek to find more distinctive words that are a modification of commonly used words. Combined usage of these words shapes distinct dialects. These words include ayam for I am, chach for church, wl for will, ts for this, issa for it is a, nd for and, et cetera. Some of the words have exclusive use in specific regions, while others have a different extent use in particular areas. We, therefore, establish 40 different words and measure their extent use in different regions. The map below shows the difference in use of the 40 words with the most popular word in an area highlighted.

From the map above, Nairobi, Mombasa, and Kisumu have a distinct and similar dialect compared with the rest of the country. Consequently, new words are likely to emanate from these cities. An example is the word zo, which is used to mean so. Its usage is only present in tweets from people in Nairobi, Mombasa and Kisumu. A further distinction from other parts of the country arises from words with low utilisation in the three major cities. A good example is the Kiswahili word eti, and its variation ati comes in use all over the country. However, a new spelling of aty has come into use from people in Nakuru, Eldoret, Kiambu, Kakamega, Bungoma, Busia and Makueni. The new variants of the word eti have no usage in tweets from Mombasa and Kisumu, with Nairobi having a small probability score.

In the country, people who state their Twitter location as Kiambu are likely to have the most eccentric dialect. The expression lokt to mean locked has been used only in tweets from Kiambu. Over and above that, tweeps from Kiambu entirely avoid the use of the X-factors (xo, xaxa, xema, xana). In stating to come, they prefer using ikam and completely disregard ikuom.

NB: Retweets were not used in the analysis

*The author is a data scientist