Machine LearningMapping out the connections of Oscar Winners | by...

Mapping out the connections of Oscar Winners | by Milan Janosov | Feb, 2024

-


Towards Data Science

In this short piece, I use public Wikipedia data, Python programming, and network analysis to extract and draw up a network of Oscar-winning actors and actresses.

All images were created by the author.

Wikipedia, as the largest free, crowdsourced online encyclopedia, serves as a tremendously rich data source on various public domains. Many of these domains, from film to politics, involve various layers of networks underneath, expressing different sorts of social phenomena such as collaboration. Due to the approaching Academy Awards Ceremony, here I show the example of Oscar-winning actors and actresses on how we can use simple Pythonic methods to turn Wiki sites into networks.

First, let’s take a look at how, for instance, the Wiki list of all Oscar-winning actors is structured:

Wiki list of all Oscar-winning actors

This subpage nicely shows all the people who have ever received an Oscar and have been granted a Wiki profile (most likely, no actors and actresses were missed by the fans). In this article, I focus on acting, which can be found in the following four subpages — including main and supporting actors and actresses:

urls = { 'actor'         :'https://en.wikipedia.org/wiki/Category:Best_Actor_Academy_Award_winners',
'actress' : 'https://en.wikipedia.org/wiki/Category:Best_Actress_Academy_Award_winners',
'supporting_actor' : 'https://en.wikipedia.org/wiki/Category:Best_Supporting_Actor_Academy_Award_winners',
'supporting_actress' : 'https://en.wikipedia.org/wiki/Category:Best_Supporting_Actress_Academy_Award_winners'}

Now let’s write a simple block of code that checks each of these four listings, and using the packages urllib and beautifulsoup, extracts the name of all artists:

from urllib.request import urlopen
import bs4 as bs
import re

# Iterate across the four categories
people_data = []

for category, url in urls.items():

# Query the name listing page and…



Source link

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest news

Defending against Prompt Injection with Structured Queries (StruQ) and Preference Optimization (SecAlign)

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications. However, as LLMs have improved,...

Transform Customer Feedback into Actionable Insights with CrewAI and Streamlit | by Alan Jones | Dec, 2024

AI for BIBuild an AI-powered app to analyze unstructured feedback, generate insightful reports, and create interactive visualizationsNew AI...

Talking about time like a human.

Jotting down some notes,...

Must read

You might also likeRELATED
Recommended to you