YouTube Chatter: Understanding Online Comments Discourse on Misinformative and Political YouTube Videos

Abstract

Motivation

Data

Channel Selection

Figure 1: A graphic depicting the MBFC categorizations and MBFC factual ratings (where applicable) for the fourteen channels we scraped. We made some categories of our own, marked as [unofficial], for later reference.
Figure 2: Views across all sampled videos for the YouTube channels, normalized by the number of videos per group. Grouped visually by category and factual rating. Views measured in thousands. A chart of the views by channel can be found in the appendix, item 3.

Data Collection

Released Datasets & Code

Datasets

Code

Results

Comment Engagement For Political Content:

Figure 3A: Total CPV, organized by channel, within our dataset.
Figure 3B: Total CPV, by category. Total view counts per channel are provided in our appendix item 3.
Figure 4: (Left) Total comments, including both direct comments and replies, divided by videos per channel. (Right) Total comments divided by videos per channel, averaged over category.
Figure 5A: Average thread length for each channel. Average thread length is the average number of replies on each direct comment over all the videos we scraped for a particular channel.
Figure 5B: Average thread length organized by category.
Figure 6A: Average number of characters in a comment, organized by channel.
Figure 6B: Average comment length, organized by category.
Figure 7A: Channel profanities for our dataset.
Figure 7B: Category profanities for our dataset.

Predicting the MBFC Category of a Video using Metadata

Figure 8A: A list of the features we used, as well as each one’s importance in Decision Tree and Random Forest.
Figure 8B: A 2-dimensional t-SNE visualization of the training data.
Figure 8C: Confusion matrices for models with hyper-parameters chosen by 5-fold cross validation. (Top left) Decision tree with maximum depth = 12: test accuracy = 65.7%. (Top right) Random forest with unlimited max depth, 61 estimators: test accuracy = 80.2%. (Bottom left) SVM with C = 100: test accuracy = 62.5%. (Bottom right) SVM with RBF kernel, with gamma = 0.1, C = 175: test accuracy = 79.0%.

Discussion

References

Misinformation:

Comments analysis:

Edits

--

--

--

UC Berkeley EECS ’19 → Google SWE. Interested in misinformation prevention, tech for social good, and education.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

The Power of Writing: How Rhetoric Plays Into the Rollout of the COVID-19 Vaccine in Prisons

WAR AND DECEPTION: HOW THE MEDIA ARE AT THE CENTRE OF THE WORLD’S POLITICAL STAGE

“Fun or factual”

From The Editor at Politically Speaking to Our Readers

Politically Speaking — The Insider’s News and Picks

National News Roundup: Year 5, Week 32 (August 22–28)

Politically Speaking — The Insider’s News and Picks

Case Study of the Jussie Smollett

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
JANNY ZHANG

JANNY ZHANG

UC Berkeley EECS ’19 → Google SWE. Interested in misinformation prevention, tech for social good, and education.

More from Medium

How to deal with worthy problems?

Machine Learning Questions Related To Important Basic Concepts And Their Answers: All In One

An Ai looking at maths formula and discovering patterns.

“Behind the scenes of interviews with Cindy Adams (2017)

Welcome to my Data Science Journey