Matt Struble

About

Mon, 01 Jan 0001 00:00:00 +0000

My name is Matt Struble, and I am a Machine Learning and Computer Vision engineer.

Software first impacted my life at a very young age with video games. I knew from then that I wanted to create that same wonder and inspiration I experienced growing up. Being able to impact reality always seemed like a distant dream, until my first programming class in High School. There, I realized that I could make my dream into a reality, and that I wanted nothing more than to continually focus my skills towards creating a lasting impact on the world.

Interview - Overcoming Imposter Syndrome

Fri, 24 Dec 2021 00:00:00 +0000

Deep Learning Photo Aesthetics - Data Pipeline Optimization

Thu, 17 Dec 2020 00:00:00 +0000

In my last post I talked briefly about my data preparation pipeline and how I encoded the 200k images into TFRecords. As part of this step I first serialized an image into a tensor prior to storing it as a TFRecord. This method of first serializing the image as a tensor is a fairly common step in other tutorials that talk about TFRecord image preparation [1, 2, 3, 4].

Deep Learning Photo Aesthetics - Data Preprocessing

Thu, 19 Nov 2020 00:00:00 +0000

During data analysis I found various potential pitfalls in the AVA database that could introduce unwanted biases into the final model. Here is a brief overview as a quick recap:

There exist photos with political, advertisement, pop culture, or emotional bias.
The weighted average scores per photo are strongly centralized around 5/10.

These pitfalls independently can heavily influence my final model; from making it prefer a certain political ideology, to generalizing all its predictions to 5/10. Both of which will impact the final performance and prevent what I really want the model to do: tell me with utmost certainty which photo is the best.

Deep Learning Photo Aesthetics - Introduction and Data Analysis

Mon, 26 Oct 2020 00:00:00 +0000

Introduction#

Whenever I’m with my girlfriend, travel, or just outside I tend to take a lot of photos. The problem is that a lot of the photos are of the same object just taken from different angles, different lighting, or with completely different framing. This creates the issue of later needing to go back and filter through 20 photos of the same rock in order to trim it down to just one photo that best represents the rock in the moment. The process is tedious, and I know next to nothing about what makes a good photo, thus creating a loop of trying to differentiate between nearly identical photos until I give up and push it off until the next day, then the next day, then the next day, leaving me with hundreds of leftover vacation photos.

Strengthening Deep Learning Concepts

Tue, 15 Sep 2020 00:00:00 +0000

A couple weeks ago, I posted about how the TensorFlow Certification helped me overcome my imposter syndrome within the deep learning community. The studying, and as a result my certification, was primarily focused on the high-level implementation of various model architectures within the TensorFlow framework. Which meant that even though I was certified, I felt that there was still a lot to learn in regards to deep learning.

I wanted to know what was truly happening under the hood in deep learning, to understand the low-level algorithms that make up each individual part of a whole model. I wanted to be able to make intelligent choices when doing my own research and development, and knew that learning the mathematics of deep learning would be the only way to give me that confidence.

Overcoming Self-Doubt in Pursuit of TensorFlow Certification

Tue, 25 Aug 2020 00:00:00 +0000

The Plan#

Near the end of June I decided to finally take the plunge and start seriously working towards pivoting my career towards ML and deep learning. Even though I have worked with, and deployed, production ML models in the past, and have supported ML researchers and analysts in my job for years, I didn’t consider myself an ML practitioner. While I understood different approaches for supervised and unsupervised learning, and how I could use them to perform rudimentary classifications, I was not confident about developing large productionized models with targeted end users. I truly wanted to go back to the roots, learn deep learning from a practical perspective in order to understand industry expectations, with the goal of filling in gaps leftover from the more theoretical approach of my masters degree. Voila! Enter the TensorFlow Developer Certificate Exam.

Automating Deep Learning Pipeline With NAS

Sun, 24 May 2020 00:00:00 +0000

The past few days I’ve been working on improving my machine learning pipeline in preparation of some upcoming projects. I wanted to create a system that would allow me to easily train a model from any thin client, while also preserving a history of work done on a per-project basis.

The Goal#

I had three main goals in mind for my automated pipeline:

I wanted to be able to dynamically configure settings on a per-run basis, settings like:

Which server to execute the training on
How long to let the training session run for
How frequently to back up the checkpoint directory
What tensorflow version and conda environment to use.

Next, I wanted to be able to start training from any device.
Lastly, I wanted to backup everything that went into each run.

This way I would be able to reproduce the exact same results.

The Solution#

Enter Synology NAS. The pure functionality of a NAS is to provide redundant high-capacity storage on a Linux OS, which conveniently allows the execution of custom bash scripts. This allowed me to write a series of scripts, each of which satisfied one of the goals above.

Utilizing Document Fingerprinting for Variable String Matching

Thu, 16 Apr 2020 00:00:00 +0000

A question was posted in a group chat: how many times does my one friend say the phrase “gamers in chat”? At the time I was beginning to dabble with the Discord API, so I took it upon myself to figure out the actual count. The algorithm can be seen in action within my Discord GamerBot.

The Problem#

User submitted text, especially in a group chat, is variable, prone to spelling mistakes, and all-around unreliable data. A simple string compare will only catch an exact match, missing any of the following potential variations: “gamer in the chat”, “gamers int he chat” , “gamers in this chat?”, “gamers get in chat”.

Analyzing Climate Change Stance Through Twitter Data

Sun, 01 Dec 2019 00:00:00 +0000

With 22% of US adults indicating they use Twitter, the platform has become a key stage where the climate change conversation unfolds. As such, this project hoped to understand—and visualize—Americans’ views of climate change as seen through the lens of Twitter.

Method#

The approach was two-pronged:

Develop a multi-layered predictive model trained with labeled data.
Create interactive visualizations housed on a dedicated webpage that facilitates comprehension and boosts engagement.

An important distinguishing characteristic of this project is that it aimed to look past the accuracy of an analytical product and relate the sentiment data to demographic characteristics. It also casts a wider net when collecting raw data, incorporating both critical keywords (i.e., “climate change” and “global warming”) and popular hashtags (e.g., #parisagreement, “#climatehoax”) that represent both sides of the conversation.

Heineken® AR Cheers Campaign

Tue, 13 Aug 2019 00:00:00 +0000

A web app that blended augmented reality and artificial intelligence to create an interactive user experience for the Heineken® Formula 1 campaign. The campaign was the first time a brand has used web-based AR technology to power a live competition globally.

Project Goal#

In June I was contracted to develop the image recognition component of the Heineken® AR Cheers Campaign. I was given six weeks to create the Heineken® logo detection logic, which needed to fit the following criteria:

CV

Mon, 01 Jan 0001 00:00:00 +0000

Experience#

Lead AIML Engineer

2021 - present

Nike Inc — Supplier of athletic apparel

Boston, United States

Architected, and lead development of, model pipelines for training, deploying, and monitoring, in GenAI initiative.
Planned, designed, and performed, multiple cross-team software migrations into AWS SageMaker.
Optimized PySpark pipelines, drastically reducing model training and inference time, and improving overall model accuracy.
Developed a standardized Python package for AWS, logging, and test reliability, to reduce Data Scientist overhead within AWS environments.
Led the initiative to update repositories and defined engineering best practices, reducing development time and increasing CI/CD reliability within Jenkins.
Onboarded and supported offshore team by running agile ceremonies and creating documentation on engineering standardization, expectations, and software redesign.
Designed, and implemented, transition from batch file processing to a new API endpoint for forecasting models.

Mission Critical Software Engineer

2019 - 2020

Draper — Guidance, Navigation, and Control research non-profit.

Cambridge, MA

Developed data analytics tools with machine learning algorithms to assist engineers with hardware analysis.
Processed system data, sensor data, and real time flight data in order to improve GNC algorithms.
Took on responsibilities of Scrum Master and led CI/CD initiative.

Senior Software Engineer

2018 - 2019

Raytheon — Aerospace and Defense

Tewksbury, MA

Implemented signal processing algorithms, and time cirtical control functions, involved in direct control of sensor system.

Software Engineer

2017 - 2018

Netnumber — Telecom ecosystems

Lowell, MA

Improved speed, performance, and scalability of signaling routing to fit the needs of customers in emerging markets.

Aerospace Software Engineer

2017 - 2018

General Dynamics MS — Advanced Defense Systems

Pittsfield, MA

Processed data in real time for GNC Algorithms and post-test analysis.

Continuous Improvement Student Worker

2017 - 2018

Pfizer — Biopharmaceuticals

Groton, CT

Created an internal, professional, social media platform to improve company-wide communication
Developed applications for both the Raspberry Pi and Google Glass to improve workflow and efficiency within Pfizer.

Education#

Computer Science (Masters)

2017 - 2019

Georgia Institute of Technology