YouTube Chatter: Understanding Online Comments Discourse on Misinformative and Political YouTube Videos




Channel Selection

Figure 1: A graphic depicting the MBFC categorizations and MBFC factual ratings (where applicable) for the fourteen channels we scraped. We made some categories of our own, marked as [unofficial], for later reference.
Figure 2: Views across all sampled videos for the YouTube channels, normalized by the number of videos per group. Grouped visually by category and factual rating. Views measured in thousands. A chart of the views by channel can be found in the appendix, item 3.

Data Collection

Released Datasets & Code




Comment Engagement For Political Content:

Figure 3A: Total CPV, organized by channel, within our dataset.
Figure 3B: Total CPV, by category. Total view counts per channel are provided in our appendix item 3.
Figure 4: (Left) Total comments, including both direct comments and replies, divided by videos per channel. (Right) Total comments divided by videos per channel, averaged over category.
Figure 5A: Average thread length for each channel. Average thread length is the average number of replies on each direct comment over all the videos we scraped for a particular channel.
Figure 5B: Average thread length organized by category.
Figure 6A: Average number of characters in a comment, organized by channel.
Figure 6B: Average comment length, organized by category.
Figure 7A: Channel profanities for our dataset.
Figure 7B: Category profanities for our dataset.

Predicting the MBFC Category of a Video using Metadata

Figure 8A: A list of the features we used, as well as each one’s importance in Decision Tree and Random Forest.
Figure 8B: A 2-dimensional t-SNE visualization of the training data.
Figure 8C: Confusion matrices for models with hyper-parameters chosen by 5-fold cross validation. (Top left) Decision tree with maximum depth = 12: test accuracy = 65.7%. (Top right) Random forest with unlimited max depth, 61 estimators: test accuracy = 80.2%. (Bottom left) SVM with C = 100: test accuracy = 62.5%. (Bottom right) SVM with RBF kernel, with gamma = 0.1, C = 175: test accuracy = 79.0%.




Comments analysis:





UC Berkeley EECS '19 → Google SWE. Interested in misinformation prevention, tech for social good, and education.

