Web Scraping

In this project, I employed web scraping techniques using Beautiful Soup and Requests in Python to gather and analyze a dataset of the largest companies in Africa by revenue. The primary source of data was Wikipedia, which contains up-to-date listings of corporations across various sectors in Africa.
The project highlights my ability to extract valuable, unstructured data from the web, clean and transform it into a usable format. Using Jupyter Notebook, I efficiently wrote and tested the code to ensure the scraping process was optimized, handling potential challenges such as HTML parsing, handling missing data, and avoiding duplicate records.
After collecting the data, I used Python’s data manipulation libraries, such as Pandas, to clean, organize, and provide initial analysis. The final output delivered a clear dataset that could be used for further analysis, reporting, or visualizations.
This project showcases my proficiency in:
1. Web Scraping: Extracting large datasets from the web using Beautiful Soup and handling diverse HTML structures.
2. Data Cleaning & Transformation: Handling real-world messy data by addressing missing entries, duplicates, and formatting issues.
3. Automation: Automating the data collection process efficiently using Python tools and functions.
This project reinforces my technical skillset in extracting insights from unstructured web data and preparing it for further business analysis, helping organizations access key corporate performance information in Africa.