As WTS continues it’s Big Data explorations in partnership with Africa’s Voices Foundation (AVF), we are digging further into the analysis of conversations on the DJB Facebook page. This post (a third in a series, you can read the other posts here and here) reflects how the “digital trace” data can help better understand how different youths segments – the insiders, professionals, disengaged, disgruntled and disenfranchised – engage in social media conversations on governance.
The past few years have seen an explosion of interest in new “big data” methods developed to understand social behaviour. Many of these methods have looked at what are called “digital traces”: a treasure trove of data left behind from everyday interactions online and on social media. A simple illustration of how rich a source of data these digital traces can be is the Facebook page of DJB. It is liked and followed by over half a million youth in Kenya. Moreover, during the past year alone, the 7000 posts on the page were commented on over 100,000 times by over 40,000 fans. The posts were also liked over 650,000 times by more than 170,000 fans.
Given the large scale of these conversations, how do we make sense of the data, in particular, how can we use social media activity to explain the ways youth engage with politics and governance in Kenya?
Through its previous studies (here and here), Well Told Story has discovered five prototypical ways that the Shujaaz fans orient themselves to governance-related topics, i.e., the five youth segment:
- Insiders: youth who are positively engaged in governance and benefitting from this engagement;
- Professionals: the ones who blindly hang around politicians doing the underpaid part of youth tenders;
- Disengaged: those who feel excluded, let-down and fed-up; they had hopes for the devolution but have now given up;
- Disgruntled: the ‘angry’ youth, fighting to regain their place and voice in the community;
- Disenfranchised: a bulk of female youths, who ‘don’t even know they were supposed to care’ about politics.
Each of these youth segments manifest different types of behaviour, including different preferences for political topics and the level of optimism/pessimism about their ability to influence change in Kenya.
In our currecnt study we wanted to explore how these different youth segments relate to the actual social media behaviour of DJ B fans. That is, what kinds of social media behaviour would be typical of the different types of youth segments; and what methods could be developed to empirically test the “linkages” between real-life and online behaviors?
In order to do this, we identified four types of indicators that help categorise the digital traces left behind by fans on DJ B’s Facebook page, including:
- Content indicators — all the textual content found on the page. This content can be mined for different insights, such as whether the sentiment of the content is positive or negative, whether angry/hateful words or tribal stereotypes are used, what themes/topics are found in the comments, or even how many words are used in comments on average (with more sustained arguments having on average more words). These content indicators can be linked to the youth segments to better understand what and how different segments discuss.
- Engagement indicators — all interactions the fans have on the page. This includes the number of comments, reactions and shares they have, and how these interactions relate to different types of topics/content. These engagement indicators can be also linked to the different youth segments to better understand their behavioural patterns, i.e., what they do on the page.
- Network indicators — all networks/online relationships fans represent. This includes identifying how the different fans relate to each other (such as who likes or comments on whom), who are the most central actors in the networks (fans who are most active commenting across different topics) as well as how different clusters of fans relate to each other (i.e., are they interconnected or isolated). These network indicators can be also linked to the youth segments to better understand who different segments connect (or not) with and with what level of intensity.
- Activity indicators — all characteristics of online engagements/activities, including the frequency of how often people comment and/or like can be one of the features that helps differentiate between youth segments. Or similarly, what time of day fans are more active can also be potentially linked to different youth segments.
One way to use these indicators to gain further understanding how youth in Kenya relate to governance-related issues and politics is to use supervised machine learning methods, i.e., methods used in computational social science where a subset of a larger data is manually labelled based on expert knowledge about the target audience of the study. In our case, we have to classify a subset of DJ Bs Facebook fans and their behaviour according to the five youth segments. Next, we will have to train and test the algorithm to infer accurate patterns of the “segmented” behaviour according to the four indicators – content, engagement, network and activity.
The additional value of the machine learning methods is in the possibility of using them to identify which features of the digital trace data — which indicators — are the most relevant for categorising the five youth segments. This can be done through, for instance, a set of the decision tree learning methods. A good example of the decision-tree has to do with predicting whether the passengers of Titanic will survive the accident based on who they are. The algorithm uses the different features of the passengers (gender, age, etc) to create a classification tree to decide step-by-step the probability of survival of a given passenger.
If we apply this tree-like decision structure to the digital trace data of DJB fans, we can understand what features (or combinations of features) of their social media behaviour are characteristic to the five youth segments, for example:
- The insiders would be more likely to have a central position in the network. They would also actively comment and like across a number of topics and subtopics. The overall sentiment of their conversations would be positive with little or no use of angry/hateful words or tribal stereotypes. They would also be power fans, insofar as that they would actively engage both with governance-related power topics (e.g., ji-activate or infrastructure) as well as across other topics.
- The professionals would also have a central position in the network but they would rather be more clustered around like-minded people in the network. They would also actively comment and like but this would take place more across selected topics and subtopics rather across all the topics. The sentiment of their comments would be mostly mixed or negative, depending on who the comment is targeted towards, displaying more classical inside/outside behaviour of political conflict. The comments would also exhibit a higher probability of using angry/hate words or tribal stereotypes depending on who the message is targeted towards.
- The disengaged would probably have a less central position in the networks and be part of network “cliques” with-like minded fans. They also would probably have a higher number of reactions and a lower number comments even if they would engage across different themes and sub-themes. The sentiment of their comments would also be mostly positive and they would not exhibit the use of angry/hate words or tribal stereotypes.
- The disgruntled would have a medium or high position in the networks and they would be connected to a high number of people. They would also have a high number of comments and likes across a variety of topics. The sentiment of their comments, however, would be more negative, and they would exhibit a more widespread use of angry/hate words or tribal stereotypes.
- And finally, the disenfranchised, would probably have a higher number of likes but few or no comments. The average word count of their comments would be short indicating a peripheral engagement with the issue. When they would engage actively, they would engage primarily with topics that are not linked to politics.
This feature extraction approach can be also illustrated by a decision tree that shows how different digital trace indicators can help understand the typical social media behaviour of the five governance segments.
All of the hypotheses about the segments’ behaviors require further empirical and iteractive testing. However, the initial analysis clearly shows that (a) digital trace data can be a rich source of data to understand fans’ behaviours beyond basic counts of reactions and actions; and (b) when combined with qualitative insightsa and a rigorous research design, digital traces analysis can enable new granular insight into how youth behave on social media and on a scale not usually available from more conventional approaches.