sa-MO-a or SA-MO-a? Evidence from the data

Ireland are playing Samoa in rugby tomorrow and I had expected that Samoa would be the word of the week. Unfortunately, the pre-match build-up has been overshadowed by Typhoon Hagibis which threatens to banjax the integrity of the tournament, including Scotland’s attempt to knock Japan out of the tournament. Regardless, I have prepared some data to help you all know how to pronounce the name of this Pacific Island nation.


Irish people generally say sa-MO-a (səˈməʊə) but the powers that be (#NewZealanders) call it SAH-MO-a (ˈsɑ:ˌməʊə), a pattern that is similar to the compound words lawnmower and carjacker but very odd in a base word. Nonetheless, the presence of two possible pronunciations is very helpful for people who want to show off their cultural awareness by squeezing out weird-sounding words while, like, scoffing a croissant. But for the rest of us, there is no harm in getting some facts on the matter.

The Horses’ mouths

If you want to hear the word Samoa from the horses’ mouths, you can hear these two speakers of the Samoan language pronouncing their country’s name at (All the words in the world, pronounced!).

Understandably, Kiwis pronounce Samoa much like the Samoans because the Pacific Islanders have had a major demographic influence on New Zealand (c.f. Bundee Aki’s dining patterns this week), and hence you would expect some Samoan influence on New Zealand English. Nevertheless,  there are limits to this as the stress pattern has to already occur in the language for it to be adopted. You can’t expect English speakers to introduce a totally new kind of pronunciation. This happens with Irish names in England where the locals simply don’t have the vocal resources to say Peter O’Mahoney. *

This image is missing information. To view the rest, click on my Tableau public page .

Two syllables (shown in red)

The tree map shows the most common syllabic patterns of English words. Almost half of all English words have two syllables (shown in red) and most of these have one stressed syllable followed by one unstressed syllable, as in SCOT-land, ENG-land, IRE-land, CHI-na and FI-ji. Of course there can be a little variation across English speakers, because Americans may have three syllables in I-yer-land while fictional drunken Brits may have three syllables in Eng-er-land. The other bunch of bisyllabic words (Greenland, Japan, Bhutan and Taiwan) are interesting but not relevant for today.

Monosyllables (orange)

Monosyllables, exemplified by France in the orange box, are the simplest group in the graph. There is only one way that you can have a single primary stress: Spain, Wales, Greece, Laos etc. Now, there do exist words with no stress, as in the function words (is, to, the)  in the sentence SHEILA is GOing to the SHOPS.  This is a hard-working but small group of words which don’t register on the graph because the dataset contains over 130,000 word forms and names ( the Carnegie Mellon University Pronouncing Dictionary). The next job for me will be to merge a full frequency wordlist with the current dataset and do a similar graph. Undoubtedly, monosyllabic words would then feature much more heavily.

Three syllables (grey-green)

Among tri-syllabic words, Australia (010) and Canada (100) are neck and neck at almost 10% of all words each. Other examples include Korea and Malaysia as against Italy, Germany, Hungary, India, Africa, etc. Mercifully, the language is not as haphazard as it first appears — thus there are rules for deciding which pattern to choose. If the second syllable has a long vowel (ko-REE-a) or a short vowel + consonant (ju-MAN-ji) then it goes like Australia. Those are called heavy syllables. Otherwise, it’s like Canada (with a short or zero vowel). Unfortunately, English spelling may not mark this clearly, and this results in variable pronunciations such as U-rin-al and u-RI-nal.

The next group, Uruguay (102), are shaped rather similarly to Canada  except that they finish in heavy syllables. Their pronunciation may depend on where the word occurs within a speech phrase. For example, Netherlands may alternate between the two  patterns depending on speed or emphasis in the speech. Next up is CAL-CUT-ta, which is HEAVY-HEAVY-LIGHT, and thus has two stressed syllables. The next question is which of the two stressed syllables is bigger?

English word stress

This is where things get tricky. Before we try to understand tri-syllabic words, it makes sense to explore the smaller range of possibilities for bisyllabic words. English syllables can have three states: PRIMARY STRESS (1), unstressed (0), or SECONDARY STRESS (2). However, every word must have one primary stress, so there are only five possible shapes:

01 – ja-PAN, pe-RU, bra-ZIL
10 –, SCO.tland, CHI-na, , FI.ji
11 – TAI.WAN
21 – BHU.TAN

If you need convincing, then say the word five times in a row:
Scotland. Scotland. Scotland. Scotland. Scotland.
Thailand.  Thailand.  Thailand.  Thailand.  Thailand.

Do they all matter?

All five of these word shapes appear in the bar chart below which also shows the top 20 most common stress patterns of English. This time bisyllabic words are in red and tri-syllables in Irish-jersey green (I will update this asap but I wanted to publish this in time for peoples’ commute home on Friday).

We can now do the same thing for tri-syllabic words, where we now see 18 different options (again in bold), but only six of these occur in the top 20:

010 au-STRA-lia
100 CA-na-da
102 U-ru-GUAY

120 SAH-MO-a
201 MO-zamBIQUE
210 CAL-CUT.ta

We still have to decide which version of Samoa to choose. If we follow the little algorithm from earlier, we can count syllables backwards from the end. If the second syllable is heavy (long vowel or short vowel + consonant) then the syllable should be stressed. This reduces the options to 010 (Australia), 210 (Calcutta) or 120 (the New Zealand pronunciation of Samoa). However, in order to be able to have a 210 or 120 pronunciation, you have to be able to have a long ɑ: vowel and that is simply not possible in most Irish accents. If you add that to the fact that 010 words are at least six times more common as both 210 and 120, then it makes sense for Irish people to settle on sa-MO-a, just like au-STRA-lia.

And if you need more evidence to boost your claim, then look at this list of words, ending in ‘oa’ and you’ll see that Rocky Balboa is the candidate most like Samoa.


*There is, however, no excuse for rhyming Donnchadh with Moniker.

