YouTube Chatter: Understanding Online Comments Discourse on Misinformative and Political YouTube Videos

Abstract

Motivation

Data

Channel Selection

Figure 1: A graphic depicting the MBFC categorizations and MBFC factual ratings (where applicable) for the fourteen channels we scraped. We made some categories of our own, marked as [unofficial], for later reference.
Figure 2: Views across all sampled videos for the YouTube channels, normalized by the number of videos per group. Grouped visually by category and factual rating. Views measured in thousands. A chart of the views by channel can be found in the appendix, item 3.

Data Collection

Released Datasets & Code

Datasets

Code

Results

Comment Engagement For Political Content:

Figure 3A: Total CPV, organized by channel, within our dataset.
Figure 3B: Total CPV, by category. Total view counts per channel are provided in our appendix item 3.
Figure 4: (Left) Total comments, including both direct comments and replies, divided by videos per channel. (Right) Total comments divided by videos per channel, averaged over category.
Figure 5A: Average thread length for each channel. Average thread length is the average number of replies on each direct comment over all the videos we scraped for a particular channel.
Figure 5B: Average thread length organized by category.
Figure 6A: Average number of characters in a comment, organized by channel.
Figure 6B: Average comment length, organized by category.
Figure 7A: Channel profanities for our dataset.
Figure 7B: Category profanities for our dataset.

Predicting the MBFC Category of a Video using Metadata

Figure 8A: A list of the features we used, as well as each one’s importance in Decision Tree and Random Forest.
Figure 8B: A 2-dimensional t-SNE visualization of the training data.
Figure 8C: Confusion matrices for models with hyper-parameters chosen by 5-fold cross validation. (Top left) Decision tree with maximum depth = 12: test accuracy = 65.7%. (Top right) Random forest with unlimited max depth, 61 estimators: test accuracy = 80.2%. (Bottom left) SVM with C = 100: test accuracy = 62.5%. (Bottom right) SVM with RBF kernel, with gamma = 0.1, C = 175: test accuracy = 79.0%.

Discussion

References

Misinformation:

Comments analysis:

Edits

--

--

--

UC Berkeley EECS ’19 → Google SWE. Interested in misinformation prevention, tech for social good, and education.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Politically Speaking — The Insider’s News and Picks

Future for Digital: Big Media, Big Changes

The Halfway Point: Investigative Foundations from the Text and the Big Screen

YouGov Direct: Changing the Game for Publishers

BIG, If True: Science is Politics by Other Means

Transform Tuesday — All the News That’s Fit To Message Or Stream: On WhatsApp, Instagram, Alexa et…

The New York Times, the Internet, and the Death of ‘Opinion’

Kiki Mordi, Mark Angel, Kofi Bartels, Sammy Wejinya, Matilda Lambert, Jane Gam Dede, Honey Ojukwu…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
JANNY ZHANG

JANNY ZHANG

UC Berkeley EECS ’19 → Google SWE. Interested in misinformation prevention, tech for social good, and education.

More from Medium

AI for Medicine

Is science capable of inputting conscience into a robot?

Artificial Intelligence Is Transforming E-Learning in Five Ways

The Incredible Odds: Why AI Can Never Pick the Perfect Bracket