E Cedric Corro

Background : Mathematics and Data Science
E-mail : euced.corro@gmail.com
LinkedIn : linkedin.com/in/ec-corro/
Location : Makati City, Philippines

View My GitHub Profile

List of Data Science Projects

Exploratory Data Analysis

An Analysis of the Duties and Taxes of Imported Products in 2019

This study is an exploration of the importing behavior of the Philippines using 2019 customs data. The data is part of the “Philippine Customs Imports dataset” made available by the BoC through their website.

Playing with the Pokemon data set - exploring the basics of k-NN classification

This study explores how k-nearest neighbor (k-NN) classification can be implemented in a data set. In this study, the Pokemon data set obtained from Kaggle was used.


Big Data Analysis

Do the Doodle: Predicting “Quick, Draw!” drawings based on player’s first drawing stroke

Data preprocessing

Quick, Draw! is an online game developed by Google Creative Lab in which a neural network guesses the drawing as the player draws the given image. In this study, we aim to answer the question, “Using machine learning models, can Quick, Draw! sketches be guessed correctly given only the first few points of the first stroke?” The data set used in this study contains 36 million Quick, Draw! sketches under 82 categories. The data set has an estimated total size of 53GB.

Analyzing Air Quality in India (October 2020) Part 1

This study aims to determine the air quality profile of cities in India as this country dominates the world’s top 30 polluted cities based on IQAir AirVisual’s 2019 World Air quality Report. The dataset was obtained on AWS Registry open data source. The one used comprised of physical air quality real time data from different cities in India in the period of October 1-31, 2020. The global OpenAQ dataset used in this study has an estimated total size of 24GB.

Analyzing Air Quality in India (October 2020) Part 2

This study aims to develop a regression model that will predict the short term air quality of identified polluted cities in India as this country dominates the world’s top 30 polluted cities based on IQAir AirVisual’s 2019 World Air quality Report. The dataset was obtained on AWS Registry open data source. The one used comprised of physical air quality real time data from different cities in India in the period of October to December 2020. In the study, Amazon EMR service was used using the PySpark kernel. The global OpenAQ dataset used in this study has an estimated total size of 160GB.


Machine Learning

Improving stock price prediction accuracy using technical indicators and machine learning techniques

In this study, the data on the stock price of Jollibee Food Corporation (stock code: JFC), which is a stock listed in PSE, from January 2012 to December 2019 were used to develop a model and a method that would improve the accuracy of predicting stock prices.


Deep Learning

Stock price prediction using Reinforcement Learning

In this study, the data on the daily closing prices from January 2015 to December 2020 of the Philippine Stock Exchange Composite index (code: PSEi), which tracks the performance of the top 30 most representative companies listed on the PSE, were used to develop a stock price prediction model that follows a reinforcement-based learning. This study was done in order to explore in detail the fundamental concepts of reinforcement learning.