Introduction

This was one of my last papers of college, just now converted into a mediocre blog style post! No promises at the quality of this page. Few cultural items have permeated so deeply into childhood has the Disney Princess. From Snow White to Moana, Disney Princesses seem ever present in the zeitgeist, occupying a role in our lives seemingly agnostic of the time and place the world is in.

From its start in the early 1920s as an animation studio, to its status today as a massive multimedia conglomerate spanning sports, entertainment, theme parks, and more, Disney has impacted popular culture in almost every way— In this paper, I will examine the relationship between Disney Princess names, and the names of newborn babies in the United States.

While there’s fair discussion on who is and who isn’t aDisney Princess, for this exploration, we’ll first consider those with semi-conventional names, who’s modern status comes from the Disney films of which they are a part, and whose movies were released in the last 50 years. That includes Ariel, Belle, Jasmine, Pocahontas, Mulan, Tiana, Rapunzel, Merida, Anna, Elsa, Moana. This means we are excluding Snow White, Cinderella, Aurora (Sleeping Beauty) and Tinkerbell.

Methodology

We’ll be examining the baby naming tendencies over time with consideration to these key dates and landmarks associated with each princess.

Date Movie Release Associated Princess(es)
November 13, 1989 The Little Mermaid Ariel
November 22, 1991 Beauty and the Beast Belle
November 25, 1992 Aladdin Jasmine
June 23, 1995 Pocahontas Pocahontas
June 19, 1998 Mulan Mulan
December 11, 2009 The Princess and the Frog Tiana
November 24, 2010 Tangled Rapunzel
June 22, 2012 Brave Merida
November 27, 2013 Frozen Anna & Elsa
November 23, 2016 Moana Moana

To refine my exploration I considered the question to examine to be: Does the release of a Disney princess movie change the naming rates of the respective princess’s name in a set number of years before, and after, the release year.

Data

Data Exploration began with consideration to which data sets would provide me the necessary information in a form conducive to this kind of exploration. Many naming data sets only contained the 2000 most popular names in any given year, but eventually I was able to find the entire US Department of Social Security data set on baby names, including over 32,000 names in many years.

One challenge to noe is that with naming data, when a name has been given between 1 and 4 times in a given year, it’s abstracted to 0 to maintain anonymity. For most names, the counts were orders of magnitudes larger than 4, so this did not present an issue, but in some cases this certainly created uncertainty in data.

After converting the data into an accessible form, I then processed the data to give counts of each occurance of the names Ariel, Belle, Jasmine, Pocahontas, Mulan, Tiana, Rapunzel, Merida, Anna, Elsa, and Moana in each year of social security card data starting in 1979, and ending in 2017. On first glance, this gave some clear trends of names over time— some names didn’t event exist in the years preceding a movie release, and then became quite common. To visualize this information, I plotted name counts vs years for each princess name over the entire data set.

For most names— Ariel, Belle, Jasmine, Tiana, Anna and Elsa. This resulted in charts showing gradual changes in the name over time, with some showing moderate to steep peaks near the movie release date. These plots are shown below.

However, while the above plots are interesting in that a quick glimpse indicates ANOVA may give interesting results, there were a few other plots that actually stood out far more, and some that stood out so much I’ve decided to exclude their names from analysis.

The first interesting plots come by the name Merida (from Disney’s Brave, released in 2012) and the name Mulan (from Mulan, released in 1998)

If you look at the sections before and after the asterisk, we see Mulan and Merida were basically not actually names used until the movie. While the occurrences for mulan are small in general, an is a case where the 1-4 occurrences = 0 challenge may be causing a more drastic visual than reality, Merrida, however goes from having virtually no naming occurances to often over one hundred.

Some names also gave data that wasn’t well able to be processed, and therefore were removed from the study. These were the data sets associated with the names Pocahontas, Rapunzel, and Moana. For rapunzel and Pocahontas, this was simply because so few babies had ever been named rapunzel, that nearly all years were reported as zero. However, oddly, both names saww their first occurrences larger than 4 in 2016 and 2017 respectively, something for which great speculation would be required to explain.

The other name removed from the analysis due to data is Moana. While the plot clearly shows a large spike in names around the movie release time, the movie is so recent (released in 2016) that only one year of data exists for the years after it was released.

Results

To analyze if the release of a Disney princess movie change the naming rates of the respective princess’s name in a set number of years before, and after, the release year, I examined for each name (Ariel, Belle, Jasmine, Tiana, Anna and Elsa) a pair of null and alternative hypotheses, performed 1-Way Analysis of Variation (ANOVA), and calculated descriptive statistics.

The Little Mermaid (Ariel)

For analysis of Ariel, I selected the following hypotheses:

H0: There is not variation in names between the 10 years before and 10 years after

H1: There is variation in names between the 10 years before and 10 years after

The data is summarized by this table

Groups Count Sum Average Variance
1979-1988 10 5222 522.2 61774.4
1989-1998 10 27588 2758.8 1492813.956

The duration (10 years) of each data group is due to the amount of data preceding ariel (the subset of this data began in 1979). We can see just from the data summary that ANOVA may provide interesting results.

Source of Variation SS df MS F P-value F crit
Between Groups 25011897.8 1 25011897.8 32.17816178 0.00002218743622 4.413873312
Within Groups 13991295.2 18 777294.1778
Total 39003193 19

From this we see an extremely low P-Value of 0.00002218743622 (alpha=.05), to which we reject the null hypothesis and say there is variation in occurances of the name Ariel between the 10 years before and 10 years after The Little Mermaid was released.

Belle (Beauty and the Beast)

For analysis of Belle, I selected the following hypotheses:

H0: There is not variation in names between the 12 years before and 12 years after

H1: There is variation in names between the 12 years before and 12 years after

The data is summarized by this table

Groups Count Sum Average Variance
1979-1990 12 146 12.16666667 19.96969697
1991-2002 12 674 56.16666667 1021.606061

The duration (12 years) of each data group is again due to the amount of data preceding ariel (the subset of this data began in 1979). The ANOVA results again show interesting results.

Source of Variation SS df MS F P-value F crit
Between Groups 11616 1 11616 22.30466659 0.0001033025434 4.300949462
Within Groups 11457.33333 22 520.7878788
Total 23073.33333 23

From this we see an extremely low P-Value of 0.0001033025434 (alpha=.05), to which we reject the null hypothesis and say there is variation in occurances of the name Belle between the 12 years before and 12 years after Beauty and The Beast was released.

Jasmine (Aladdin)

For analysis of Jasmine, I selected the following hypotheses:

H0: There is not variation in names between the 13 years before and 13 years after

H1: There is variation in names between the 13 years before and 13 years after

The data is summarized by this table

Groups Count Sum Average Variance
1978-1991 13 56546 4349.692308 15347612.73
1992-2005 13 126211 9708.538462 1374329.269

The duration (13 years) of each data group is again due to the amount of data preceding Aladdin’s release (the subset of this data began in 1979) and the amount of data after (the data ends in 2017). The ANOVA results again show interesting results.

Source of Variation SS df MS F P-value F crit
Between Groups 186662008.7 1 186662008.7 22.32539841 0.00008356544002 4.259677214
Within Groups 200663304 24 8360971
Total 387325312.7 25

From this we see an extremely low P-Value of 0.00008356544002 (alpha=.05), to which we reject the null hypothesis and say there is variation in occurances of the name Jasmine between the 13 years before and 13 years after Aladdin was released.

Tiana (The Princess and The Frog)

For analysis of Tiana, I selected the following hypotheses:

H0: There is not variation in names between the 10 years before and 10 years after

H1: There is variation in names between the 10 years before and 10 years after

The data is summarized by this table

Groups Count Sum Average Variance
1998-2007 10 7837 783.7 21671.78889
2008-2017 10 5989 598.9 30309.43333

The duration (10 years) of each data group is again due to the amount of data preceding The Princess and The Frog’s release (the subset of this data began in 1979) and the amount of data after (the data ends in 2017). The ANOVA results again show interesting results.

Source of Variation SS df MS F P-value F crit
Between Groups 170755.2 1 170755.2 6.569880149 0.01955313901 4.413873312
Within Groups 467831 18 25990.61111
Total 638586.2 19

From this we see an extremely low P-Value of 0.01955313901 (alpha=.05), to which we reject the null hypothesis and say there is variation in occurances of the name Tiana between the 10 years before and 10 years after Aladdin was released.

Anna and Elsa (Frozen)

For analysis of Ana and Elsa, I selected the following hypotheses:

H0_A: There is not variation in names between the 5 years before and 5 years after

H1_A: There is variation in names between the 5 years before and 5 years after

H0_E: There is not variation in names between the 5 years before and 5 years after

H1_E: There is variation in names between the 5 years before and 5 years after

The data is summarized by these tables

Groups Count Sum Average Variance
Ana 2008-2012 5 31664 6332.8 514477.7
2013-2017 5 25477 5095.4 213767.3
Elsa 2008-2012 5 2366 473.2 2890.7
2013-2017 5 3263 652.6 82040.8

The duration (5 years) of each data group is again due to the amount of data preceding the movie release (the subset of this data began in 1979), and the amount of data after (data ends in 2017). The ANOVA results again show interesting results.

Source of Variation SS df MS F P-value F crit
Ana Between Groups 3827896.9 1 3827896.9 10.51266236 0.01183635874 5.317655063
Within Groups 2912980 8 364122.5
Total 6740876.9 9
Elsa Between Groups 80460.9 1 80460.9 1.894724572 0.2059634674 5.317655063
Within Groups 339726 8 42465.75
Total 420186.9 9

From this we see an extremely low P-Value for Ana of 0.01183635874 and of 0.2059634674 for elsa (alpha=.05), to which we reject both null hypotheses and say there is variation in occurrences both of the name Ana and the name Elsa between the 5 years before and 5 years after Frozen was released.

Future Work

It’s clear the impact of Disney princess names on american population names is large, and the results of this paper’s analysis show future work in this space would be quite interesting, perhaps we'll quantifying the correlation, and other variables (other key dates, generational shifts, 9 month delays).

Conclusion

As it stands, for every single Disney princess name examined, there is a correlation in movie release and a change in naming rates. While statistical analysis has not been done to inform what that correlation is, we do graphically see in most cases a notable increase in names after the associated movie is released. In some cases, even, names going from 1-4 occurances to over 100.

Sources

https://funmoneymom.com/Disney-Princess-list/

https://www.ssa.gov/oact/babynames/

https://www.imdb.com/