Project information

Big Data Pipeline for Real-Time Stock Market Analysis

Project Description

The stock market plays a pivotal role in the global financial system, facilitating the buying and selling of public and private company shares. Understanding its intricate dynamics is crucial for investors seeking to make informed decisions. The stock market generates a constant stream of data, encompassing prices, volumes, news sentiment, and more. This massive volume, often termed ”big data,” requires sophisticated techniques for efficient management, processing, and analysis.

This project aims to analyze stock market data from top tech companies listed on NASDAQ using Apache Kafka, Spark, and Flask. We calculate 50-day and 200-day Simple Moving Averages (SMAs) to potentially identify buying and selling signals.

Apache Kafka and Apache Spark are chosen for real-time data ingestion and processing, respectively. Python's PySpark library is used for Spark applications, and Flask is utilized for web visualization.

The chosen dataset contains stock price information for each day. Some of the fields are Symbol, Date, Open, Close, High, Low, Volume, etc.

  • Source: Yahoo Finance
  • Companies: SPY, AAPL, MSFT, TSLA, NVDA, AMZN, GOOG, META, JPM, and GME
  • Time Period: Past 10 years