Not just a language with white faces : Analysing # taalmonument on Instagram using machine learning

Since the late 19th century, Afrikaans ‘was constructed as a “white language,” with a “white history” and “white faces”’ (Willemse 2017). Because the Afrikaner-dominated National Party (NP) carried out its policy of racial segregation (apartheid) in South Africa from 1948 to 1994, Afrikaans also became associated with apartheid. In particular, the 1976 Soweto riots, which was to a large extent opposition towards Afrikaans as a medium of education, turned the focus of anti-apartheid resistance towards Afrikaans, ‘This rebellion stigmatized or “further stigmatized” Afrikaans, because the apartheid policy and its application caused injustice and increased a negative attitude towards Afrikaners and Standard Afrikaans’ (Steyn 2014:418).1


Introduction
Since the late 19th century, Afrikaans 'was constructed as a "white language," with a "white history" and "white faces"' (Willemse 2017). Because the Afrikaner-dominated National Party (NP) carried out its policy of racial segregation (apartheid) in South Africa from 1948 to 1994, Afrikaans also became associated with apartheid. In particular, the 1976 Soweto riots, which was to a large extent opposition towards Afrikaans as a medium of education, turned the focus of anti-apartheid resistance towards Afrikaans, 'This rebellion stigmatized or "further stigmatized" Afrikaans, because the apartheid policy and its application caused injustice and increased a negative attitude towards Afrikaners and Standard Afrikaans' (Steyn 2014:418). 1 Afrikaans is still associated with the apartheid government and related concepts such as oppression and the restriction of freedom, which has led to a resentment towards the language by a large proportion of the South African population (Van Zyl & Rossouw 2016:310). Recent protests at South African university campuses (e.g. #AfrikaansMustFall and #OpenStellenbosch) saw black students mobilising to remove Afrikaans as a language of tertiary education, arguing that it remained a barrier to education, offered an unfair advantage to white students, perpetuated racial segregation and alienated black students. This hostility towards Afrikaans can also be seen in the conduct of African National Congress (ANC) officials, in particular Gauteng MEC for Education, Panyaza Lesufi, and Minister of Higher Education, Blade Nzimande, who have made numerous statements against Afrikaans (Friedman 2019;Steward 2014). Nzimande, for instance, called the private Afrikaans-only tertiary education institution, Akademia, 'racist' because of its language policy (Steward 2014), whilest Lesufi made similar comments about Sol-Tech (Friedman 2019). Such views ignore the fact that the majority of Afrikaans speakers (60%) are not white people (Willemse 2017) but have become commonplace in South Africa nevertheless.
The monument to the Afrikaans language, the Afrikaanse Taalmonument (Afrikaans Language Monument), is likewise associated with apartheid by some, and there have been calls to dismantle 1.Author's own translation from the original Afrikaans, 'Hierdie opstand het Afrikaans gestigmatiseer of "verder gestigmatiseer," want die apartheidsbeleid en die toepassing daarvan het onreg veroorsaak en 'n negatiewe gesindheid teenoor Afrikaners en Standaardafrikaans laat toeneem'.
the Taalmonument in the interest of nation-building (Smith 2013:124;Van Zyl & Rossouw 2016:310). Groenewald (2018:230) calls the ANC-regime 'antagonistic to the language that the monument valorises', and hence hostility towards the monument itself can be expected.
The current study investigates posts made with the hashtag, #taalmonument, on the social media platform, Instagram. As Instagram posts constitute a voluntary association with this monument in the public sphere, the objective of the current study is to determine whether only white people -the race associated with Afrikaans -voluntarily associate themselves with this monument or whether people of other races do the same and to what extent. To this end, we develop, train and evaluate our own machine learning image recognition classifier after constructing our own annotated corpus of images, which is also benchmarked against an internationally recognised dataset. We also make suggestions for future research.

Background to the Taalmonument
The first proposal to erect a monument to Afrikaans was made at a commemoration of the founding of the  1975(De Vaal-Senekal et al. 2018Van Zyl & Rossouw 2016:298).
A monument to Afrikaans will inevitably be placed in the racialised discourse that is associated with this language. Although Afrikaans is currently associated with white people and apartheid, this was not always the case: When the Genootskap van Regte Afrikaners was founded in 1875, most Afrikaans speakers were not white people, and Afrikaans was often referred to as a hotnotstaal 2 (Groenewald 2018:228). As Willemse reminds us, 'Afrikaans also has a "black history" rather than just the known hegemonic apartheid history inculcated by white Afrikaner Christian national education, propaganda and the media'. Throughout the apartheid years  speak Afrikaans as a first language (Smith 2013:133). Nevertheless, the Afrikaner is generally seen as a white nation (Senekal 2019) (note that Afrikaner and Afrikaansspeaking are two different labels, the former generally denotating an ethnic group and the latter a linguistic group).
In light of this association between white people and Afrikaans, calls for the preservation of Afrikaans are often seen as an attempt to maintain segregation and 'white privilege' (see e.g. Pilane 2015).
The Taalmonument (Smith 2013:146). From the beginning, then, the Taalmonument has aimed at shedding the stigma of Afrikaans being a language reserved for white people. However, with the Soweto riots occurring just the year after the opening of the Taalmonument, this attempt at making Afrikaans more inclusive seems to have had little effect.
Today, the Taalmonument still aims at inclusivity, 'The ATM strives for all South Africans to appreciate Afrikaans. In this spirit, the ATM works hard to encourage and support Afrikaans among the youth and non-mother-tongue speakers' (De Vaal-Senekal et al. 2018:198, see also Van Zyl & Rossouw 2016:311;Smith 2013:138).
This effort to broaden the appeal of the Taalmonument and the museum should lead to a diverse collection of visitors. In the contemporary world, visitors to monuments and museums often share their visits with others on social media platforms, such as Instagram, which provides the opportunity to analyse social media posts to obtain a better understanding of who visits monuments and why. The following section provides a short background on Instagram.

Instagram
Being founded in 2010, Instagram quickly became a major role player as a social media platform. Currently, Instagram has around a billion worldwide users each month and 500 million users each day, with over 50 billion photos uploaded to date (Aslam 2020 and Instagram (Qwerty 2017:12). Instagram is a photo-based platform that allows only photo and video posts, that is, no text-only posts similar to Facebook and Twitter.
Instagram is, however, not representative of the entire population of a country as Instagram users tend to be younger (Anderson & Jiang 2018;Aslam 2020;Duncan 2016). This is particularly relevant in the current study, as people who visit the Taalmonument and post pictures of their visits later will probably be from a younger generation that is less tied to a first-hand experience of apartheid and the NP. Note, however, that we do not have access to users' ages.
To investigate whether only white people or people of different races associate themselves with the Taalmonument on Instagram, we first had to train a model to distinguish between different races. The following section provides a background to machine learning for racial classification, after which we discuss the specific methods we used.

Machine learning for image classification
Machine learning is a subfield of artificial intelligence (AI) and was developed from the 1960s onwards (Kononenko 2001;Michie 1968), in particular through the works of Rosenblatt (1962), Nilsson (1965) and Hunt, Martin and Stone (1966). The field gained ground in the most recent two decades because of the big data revolution (Jordan & Mitchell 2015:256), leading Jordan and Mitchell (2015:260) to claim, 'machine learning is likely to be one of the most transformative technologies of the 21st century'.
A large amount of recent research has been directed towards identifying race in images using machine learning (Fu, He & Hou 2014;Trivedi & Amali 2017;Vo, Nguyen & Le 2018). Although the concept of race is a contentious issue, particularly as the term is often used interchangeably or confused with ethnicity (see, e.g. Bartlett 2001;Collins 2004;Markus 2008), Fu et al. (2014Fu et al. ( :2483 define the difference between race and ethnicity simply, 'race refers to a person's physical appearance or characteristics, while ethnicity is more viewed as a culture concept, relating to nationality, rituals and cultural heritages, or even ideology'. We prefer this simple distinction between race and ethnicity and focus the rest of our discussion on race. Racial classification is in one sense a highly controversial topic, because it carries the baggage of the Population Registration Act (Union of South Africa 1950) that, along with other apartheid-era legislation, led to racial discrimination and human rights abuses in South Africa before 1994. In contrast, racial classification is not controversial in contemporary South Africa: Broad-Based Black Economic Empowerment (BBBEE), as well as the discourse around white monopoly capital, transformation, white privilege and land expropriation, assumes racial categories. Despite the abolishment of racial categories in South Africa during the final years of apartheid, racial categories have persisted in the South African census and in public discourses. Most university staff have experienced being obligated to indicate their race on administrative forms as well, with racial categories reminiscent of the Population Registration Act (Union of South Africa 1950) (white-, black-, coloured-, Indian people and other). We would therefore like to emphasise that we trained a model to conduct racial classification because the discourse on Afrikaans and the Taalmonument is already racialised; the irony of deracialising this discourse is that we first need to be able to distinguish between races to ascertain whether visitors to the Taalmonument who post about their visits afterwards on Instagram belong to one or various races.
A variety of racial classification methods using machine learning have been proposed. Fu et al. (2014Fu et al. ( :2487 note 'statistically significant variances in facial anthropometric dimensions between all race groups', which 'pave the way of anthropometry-based automatic race recognition'. The question is what to measure. There is a common misconception that race is defined by a skin colour (as exemplified by referring to people as 'white' or 'black'), and numerous efforts have been made to use skin colour to differentiate between races, but Fu et al. (2014Fu et al. ( :2485 argue 'skin color is such a variable visual feature within any given race that it is actually one of the least important factors in distinguishing between races'. A second view holds that 'physical characteristics such as hairshaft morphologic characteristics and craniofacial measurements are viewed as significant indicators of race belongings' (2014:2485), whilest another method compares the eyes of subjects; Fu et al. (2014Fu et al. ( :2490 note 'Statistically significant race differences in retinal geometric characteristics', which have been reported in several studies. We opted for a more holistic approach by extracting whole faces and teaching a model to which race faces belong, as discussed below. Depending on the criteria and level of analysis, there are between three and 200 races (Coon 1962). Fu et al. (2014Fu et al. ( :2485 distinguish between seven races, which cover about 95% of the world population: African/African American, caucasian, East Asian, Native American/American Indian, Pacific Islander, Asian Indian and Hispanic/Latino. These seven races, of course, exclude coloured people. In adapting racial classifications for the South African context, we initially used the classifications suggested by Jan Raats, whose classification was used by the NP government through the Population Registration Act (James 2012; Union of South Africa 1950) and can still be found on administrative forms in South Africa today. These categories distinguish between four races: white-, black-, coloured-and Asiatic people (we substitute his classification of 'bantu' for the more politically acceptable term 'black'). However, the difference between Indian-and Asian people is so striking that we decided to split the Asiatic category into Asian-and Indian people.
People of a mixed-race origin pose a significant challenge to existing facial recognition models (Fu et al. 2014(Fu et al. :2502.
http://www.td-sa.net Open Access This predicts that there will be difficulty in classifying South Africans who have been mixing for the past 350 years, especially for the coloured population. Afrikaners, although generally considered white people, are also not exclusively caucasian in their genetic makeup (Erasmus, Klingenberg & Greeff 2015;Greeff 2007; H. Heese 1979Heese , 1984J. Heese 1971).
Our experiments confirmed Fu et al.'s (2014Fu et al.'s ( :2502 assertion and encountered substantial difficulty in distinguishing between white-, coloured-and black faces. When all five categories were included, we failed to move beyond an accuracy level of 70%, regardless of how we refined our model. We, therefore, simplified our racial categories to a binary classification, white or black, as the objective of the current study is in any case to determine whether only white people associate themselves with the Taalmonument and whether other races do the same, regardless of which race those people belong to.
The following section describes how the model was constructed and trained.

Model training #modelsofinstagram dataset
A random sample of images was downloaded from Instagram to collect sufficient training data that could be used in the construction of a classifier. Images placed on Instagram are already annotated to some degree by placing them with a hashtag, but the hashtag indicates to which discourse the image belongs and not necessarily what the content of the image is. A picture with the hashtag #europeans could, for instance, show the picture of an African slave, as Europeans are known for slavery, but the hashtag does not indicate that the content of the image is a black African. We experimented with various possible hashtags that could be used to construct a labelled dataset, but possible hashtags differed considerably across races: Whilest #blackmodels and #indianmodels collected images of people belonging to these races, #whitemodels had a very limited selection of images and #colouredmodels created problems with the different meanings associated with the term. Hashtags such as #san, #european and #sotho did not return a meaningful number of relevant images. The hashtag #afrikaner delivered a considerable number of irrelevant images, again partly because the term carries different meanings in different languages. We eventually decided to use a single hashtag, #modelsofinstagram, and used an annotator to classify people according to race. The annotator is in his late thirties and thoroughly familiar with racial categories in a South African context.
After the annotator had labelled the images, we began work on developing an image classifier. To classify an image according to race, we first needed to perform face detection and extract a face from an Instagram image, because this reduced the amount of noise in an image. For automatic face detection, we used opencv-python 4.2.0.34 (Heinisuo 2020), a wrapper package for OpenCV Python bindings to perform image processing. OpenCV is a modern implementation of the novel Classifier Cascade face detection algorithm (Viola & Jones 2001) and provides the CascadeClassifier class that allowed us to create a cascade classifier for face detection. A cascade, in machine learning terms, is an approach where a function is trained from numerous positive and negative images. This will allow the image classifier to detect objects (such as faces) in images. The result is that OpenCV allows us to extract faces from images, regardless of how many faces there are in a single image.
Using the face detection classifier, we were able to successfully detect and extract 3534 faces (2129 that were annotated as white people and 1405 that were annotated as black people) from our training dataset, using a confidence factor of 0.98 (in other words, we only allowed OpenCV to extract faces if it was 0.98% certain that it was a face it had identified). We then randomly selected from each image dataset to create the training and testing datasets. Table 1 shows the number of images we used for the training and validation of the model.

UTKFACE dataset
We also wanted to compare our annotator's classification by benchmarking our classifier with an internationally recognised dataset. The UTKFace dataset by Zhang, Song and Qi (2017) is a large face dataset with a long age span (subjects of between 0 and 116 years old) and consists of over 20 000 images with annotations in terms of age, gender and race. We applied an age filter (18-65 years old) on the dataset resulting in 17 655 images, as the #modelsofinstagram facial images were extracted from Instagram users who will most likely fall within this age range, as will the images posted with #taalmonument. We then randomly selected from these images selecting only white people (race = 0) and black people (race = 1) images to create the training and testing datasets. We did not filter on gender, because we wanted our classifier to function across genders. Table 2 shows the training and validation sets we used from the UTKFace dataset.

Data augmentation
As both datasets consist of a relatively small number of training examples, one can inadvertently introduce overfitting into a model. Overfitting occurs when a model learns the noise instead of the signal of the training data and consequently will not generalise well from training data

Deep learning models
For our machine learning classifiers, we made use of convolutional neural networks (CNNs) (Goodfellow, Bengio & Courville 2016:326). These classifiers are a specialised kind of neural network that process vector space (grid-like topology) datasets. These datasets can be a one-dimensional grid (1-D) such as time-series data or a two-dimensional grid (2-D) of pixels such as image data. Convolutional Neural Networks have been used successfully in applications such as facial recognition and more recently in natural language processing. Examples of CNN image recognition models are MobileNet by Howard et al. (2017), Levi and Hassner's (2015) age and gender recognition model and Campos, Jou and Giró-i-Nieto's (2017) image sentiment recognition model.
Convolutional neural networks consists of a series of convolutional and pooling layers, and all CNN models have a similar architecture. The architecture of CNN models is shown in Figure 1, which is adapted from Dertat (2017).
As the name convolutional neural network indicates, the neural network model employs a mathematical operation called a convolution. A convolution is a specialised kind of linear operation and enables a CNN to use convolution instead of a general matrix multiplication in at least of one its layers (Goodfellow et al. 2016:327). After a convolution operation, the network will perform pooling to reduce the dimensionality. This enables the network to reduce the number of training parameters and as a result also shortens the training time.
The most common type of pooling is max pooling, which is the same type of pooling we use in our classifiers. This enabled the models to reduce the input to the pooling layer (e.g. 32 × 32 × 10 dimensionality) to a 16 × 16 × 10 feature map as illustrated in Figure 2 (adapted from Dertat (2017  classification block, there were two fully connected layers with 512 units on top of the convolution blocks that were activated by a relu activation function. In deep learning neural networks, the activation function is responsible for transforming the summed weight input from a node into the activation of the node or output for that node (Brownlee 2019). Popular activation functions include sigmoid (or logistic), tanh (hyperbolic tangent) or relu (rectified linear units). We opted for relu as it allows for backpropagation of errors to train our deep learning models (Goodfellow et al. 2016:226). In total, there were 10 904 097 trainable parameters.
The second CNN model consists of four convolution blocks (3 × 3 filter) with the same padding and a max pool layer (2 × 2 filter) in each of them resulting in 12 layers. For the classification block, there was a single fully connected layer with 512 units on top of the convolution blocks that were activated by a relu activation function. In total, there were 7 595 809 trainable parameters.
The third CNN model was a scaled-down version of the original VGG16 model proposed by Simonyan and Zisserman (2014). The original VGG16 model consists of five convolution blocks (3 × 3 filter) with a max pool layer (2 × 2 filter) in each of them where the '16' refers to 16 layers that have weights. Our model consists of four convolution blocks (3 × 3 filter) with the same padding and a max pool layer (2 × 2 filter) in each of them resulting in 14 layers that have weights. For the classification block, there were two fully connected layers with 512 units on top of the convolution blocks that were activated by a relu activation function. In total, there were 12 790 433 trainable parameters.
All three models output class probabilities based on a binary classification by the sigmoid activation function for output. We made use of the ADAM optimiser and a binary cross entropy loss function. Adam is an adaptive learning rate optimisation algorithm specifically designed for deep learning (Kingma & Ba 2017). As we are using a binary classifier, our loss function will also be binary and use cross entropy to measure how far from the true value (0 or 1) our prediction of each image was. The loss function will then average these class-wise errors to obtain the final loss (Peltarion 2020). We also experimented with dropout, a regularization technique used to reduce the overfitting of a network. Dropout takes a fractional number as its input value, in the form such as 0.

Testing the models
We trained the three CNN models on both datasets. As the classifier is a binary classifier (only two labels, i.e. white people or black people), we report the precision, recall, F1 and accuracy as the evaluation metrics used to assess the performance of the CNN models. Precision is the ability of a classifier not to label a sample as positive if it was negative.
Recall is the ability of the classifier to find all the positive samples. Accuracy returns the number of correctly classified samples whilest F1 is the weighted average of precision and recall. As the training of the models took a substantial amount of time, we did not train using n-fold cross validation. Crossvalidation is a resampling technique to evaluate machine learning models on a limited dataset whilest n (in n-fold) refers to the number of groups that a given dataset is split into. Instead, we made use of Model Checkpoint and Early Stopping. Model Checkpoint monitors a specific parameter of the model (we used val_loss or validation loss) and Early Stopping will stop the training process of the model if there was no improvement in validation loss after a number of epochs. An epoch refers to the number of times that a learning algorithm with work through a training dataset. We set the maximum number of epochs at 100 and allowed the model to stop after 10 epochs if there was no improvement in validation loss. After the training was completed, we tested each model with both #modelsofinstagram (n = 800) and UTFFace (n = 1000) testing datasets. Table 3 provides the test evaluation metrics of each model and dataset.
From the testing of the models, CNN Model2 with our own #models of instagram dataset performed the best. When examining the model during testing, we noted a test loss of 0.106758 with an accuracy of 0.97. The model reached the optimal training validation loss value at n = 33 epochs. In other words, our model is capable of predicting a person's race with 97% accuracy. With the model created, trained and evaluated, we could now apply it to a dataset of images downloaded with the hashtag #taalmonument, as discussed in the following section.

Data gathering
Before we could investigate the race of people that posted with the hashtag #taalmonument, we first had to download all posts tagged with this hashtag. Posts were downloaded using the application, InstaBro, on 14 February 2020. The first post was made on 01 July 2012, meaning that the dataset spans over 7 years. There were 2988 photos posted with this hashtag (#taalmonument) during this period. Note that we could not gather any data about users, including their names, age, location or gender. Importantly, we could only download posts from public profiles, that is, we were not required to follow users in order to include their content in the analysis below. In other words, these posts were made openly, in front of an audience numbering around a billion, which means that the dataset constitutes posts made by people who

Results
To perform predictions on the unlabeled facial image dataset (#taalmonument), we deployed the best-performing classification model (CNN Model 2). First, the unlabelled dataset from Instagram (#taalmonument) was preprocessed, which included scaling and extracting human faces. From the 2988 unlabelled photos, 668 human faces were identified and extracted using OpenCV (the rest of the photos were of the monument or the landscape around the Taalmonument). We then passed these facial images to our model as input and received a label as output. Table 4 summarizes the results.
The following section discusses these results.

Discussion
Census data show that most Afrikaans speakers are not white people, but as noted in the section discussing the background, the Taalmonument and Afrikaans are both accused of being exclusively white phenomena. The results in the previous section, however, show that this is not entirely the case in our study. Of 688 faces identified from Instagram posts made with the hashtag #taalmonument, 529 (76.89%) were white people and 139 (20.2%) were black people. As we showed our classifier to predict people's race with 97% accuracy, this shows that 20% of people who chose to associate themselves with the Taalmonument are not white people. The key issue here is voluntary association: whilest people may attend an Afrikaans university based on the geographical location, the availability of transport, limited course options or for other reasons, people who take a photo at a monument and post it to Instagram do so willingly and intentionally. Moreover, taking the time to travel to the monument, taking a picture and posting it on Instagram constitute a significant effort on the part of the user. The fact that a substantial number of people who associate themselves with the Taalmonument are not white shows that this monument does not only garner attention from the white population but rather functions in an inclusive capacity, as intended by Van Wijk.
However, it is unclear why only 20% of the faces we identified are not white people, whilest white people are a minority both in the national population of South Africa and amongst Afrikaans speakers. This over-representation of white people may reflect Instagram user demographics (no data are available on the distribution of Instagram use by race in SA), cultural differences or it may indicate a smaller proportional interest in the monument. It may, for instance, be that a smaller proportion of coloured people show an interest in this monument than is the case for white people, but we have no data to explain this skewed distribution and other factors could be at play.
Of course, although the above shows a diverse association of people with the Taalmonument on Instagram, this study did not conduct a representative investigation into attitudes towards the Taalmonument. Such a study of attitudes can better be conducted using a large sample of questionnaires or interviews. However, the above does show that, contrary to claims that it is an 'apartheidsmonument', users on Instagram take the time and effort to publicly associate themselves with this monument even if they are not white people.

Conclusion
This article showed that people who visit the Taalmonument and post about their visits later on Instagram are from various racial backgrounds. Contrary to the racialised discourse on Afrikaans in South Africa, our study shows that not only white people take the time and effort to travel to this monument, take pictures and post about it afterwards on Instagram -in other words, voluntarily associate with this monument on a highly public global platform. Our study therefore does not suggest that the Afrikaanse Taalmonument is considered to be a 'white people only' or 'apartheid' monument but rather a monument that has enough significance for people of other races to also take the time and effort to take photos here and post about it on social media.
We only investigated one monument and one factor, namely race. Future studies could apply a similar method to investigate the demographics of visitors who post on social media with relation to other museums and monuments in South Africa, including, for example, considering visitors' age and gender. Social media provides a wealth of data with which to investigate how museums and monuments function in the contemporary world and in people's lives, and much of this opportunity has not been realised in academic research yet.

Ethical consideration
This article followed all ethical standards for carrying out research.

Data availability statement
The data are not publicly available due to privacy restrictions.