Python Web Scraping and Data Analysis: A Simple Case Study

Python, with its extensive library support, has become a popular choice for web scraping and data analysis tasks. In this article, we will walk through a simple case study that demonstrates how to scrape data from a website and analyze it using Python. This example will focus on scraping product information from an online retail store and performing basic data analysis.

Step 1: Setting Up the Environment

First, ensure you have Python installed on your machine. Next, you’ll need to install a few libraries that will make the scraping and analysis processes easier. Open your terminal or command prompt and install the following libraries:

bashCopy Code
pip install requests beautifulsoup4 pandas
  • requests is used to fetch the web page content.
  • beautifulsoup4 is used to parse the HTML content.
  • pandas is used for data analysis.

Step 2: Scraping the Data

In this step, we will scrape product data from an example website. For demonstration purposes, let’s say we are scraping product names and prices.

pythonCopy Code
import requests from bs4 import BeautifulSoup # Target URL url = 'https://example.com/products' # Fetch the content response = requests.get(url) html_content = response.text # Parse the HTML content soup = BeautifulSoup(html_content, 'html.parser') # Extract the data products = [] for product in soup.find_all('div', class_='product'): name = product.find('h3').text.strip() price = product.find('span', class_='price').text.strip() products.append((name, price)) # Display the scraped data for name, price in products: print(f'Name: {name}, Price: {price}')

Step 3: Analyzing the Data

Now that we have the data, let’s analyze it using pandas. We will create a DataFrame and perform some basic operations.

pythonCopy Code
import pandas as pd # Convert the list of tuples into a DataFrame df = pd.DataFrame(products, columns=['Name', 'Price']) # Convert the price column to a numeric type for analysis df['Price'] = df['Price'].str.replace('$', '').astype(float) # Display the DataFrame print(df) # Basic analysis: finding the average price average_price = df['Price'].mean() print(f'Average Price: {average_price}')

Conclusion

This simple case study demonstrates how Python can be used for web scraping and basic data analysis. With a few lines of code, we were able to scrape product information from a website and perform simple data analysis, such as calculating the average price. Python’s simplicity and powerful libraries make it an excellent tool for such tasks.

[tags]
Python, Web Scraping, Data Analysis, BeautifulSoup, Pandas, Requests

As I write this, the latest version of Python is 3.12.4