Data Science in Investing: Jed Gore, CFA

Using ChatGPT to generate substitute data for language model classifier training

Code base here: https://github.com/jed-gore/chatgpt_management_transcripts PROBLEM Back in 2018 – 2019 I worked on training Machine Learning models to predict which sentences in company conference call transcripts were likely to be “guidance” or forward looking statements. The most challenging part of the project was editing Excel files with 50,000 lines of text and flagging each sentence as “guidance” or “not guidance”. This was done to have a training set that could

How to parse the pmdarima auto_arima Summary() object into discrete values

Sometimes you want to store your model output in a dataframe, for example when running grouped regressions so you’ll have multiple regression outputs to review and sort etc. But while using pmdarima’s auto_arima() function I was surprised to learn there’s no summary2() as you’ll find in statsmodels. Instead you’ll get that same retro block of text which has been familiar to users of Python scikit-learn and R’s glm() for two

SIVB Balance Sheet Visual with BPFH Deal Overlay

Using Daloopa API to pull SIVB balance sheet data. Code base here: https://github.com/jed-gore/bank_assets_chart I was curious whether it was easy to SEE what drove the issues at SIVB. I noticed a huge ramp in the balance sheet in late 2021 thru mid-2022. Turns out https://www.svb.com/news/company-news/svb-financial-group-completes-acquisition-of-boston-private2 this was in PART the acquisition of Boston Private. The deal closed 7/1/2021 so the actual reporting impact I believe would have been the quarter

ARIMA Forecasting Example

ML Ops See notebook arima_forecast.ipynb in the MLOps repo. Codebase here: https://github.com/jed-gore/MLOps A test case to begin to develop a reusable, scalable ARIMA module. from statsmodels.tsa.statespace.sarimax import SARIMAX PROCESS: Using Daloopa to pull data for AMZN (you will need to get your own API key): We isolate Net Sales And difference it twice to remove the trend and seasonality: our ARIMA looks ok: The ARIMA model looks better than naive

WebCrawler

A base class for use in inheriting. Codebase here: https://github.com/jed-gore/webcrawler_class Something to save searches because I always forget the syntax for Beautiful Soup. I’ll be adding to this to handle encoding and error handling, and specific tags. Remember: always scrape responsibly according to Terms of Service. Includes a class WebCrawler with 3 simple methods: get_html_document() get_links() and get_tables() Usage: Output: Also includes a c.py file to handle certification updates for

Serialized Python Objects Example

The goal of this project is to develop skills around serializing Python classes to and from both SQL and JSON. Codebase here: https://github.com/jed-gore/serlialized_python_objects/blob/main/README.md Reference:https://marshmallow-sqlalchemy.readthedocs.io/en/latest/ User Story:AS A financial analyst I WANT TO map my companies across various things like KPIS and SECTORS SO THAT I can visualize mappings easily in an app with JSON (and share with Javascript front end apps) but also maintain a second form database structure on the back end

Canalyst Blog

Canalyst’s blog has some relevant examples of my work doing multiple stock analysis using the Candas data science library I wrote which is still in use available on PyPI.  The library is called canalyst-candas as an intentional pun on the name of the python library Pandas.

Simple Portfolio Pairs Cointegration Test

One of the most interesting challenges I faced as a portfolio manager post the 2008 GFC was a renewed interest in pairs trading as a risk mitigation strategy. And while we often paired off stocks by region or business type, I was never really sure a pair was an actual mean-reverting pair. Fortunately there’s a pretty simple mechanism for this. Cointegration is a measure of the linear relationship between two

Intra-sector cross-correlation

Codebase here: https://github.com/jed-gore/stock_betas Monitoring cross-correlations inside a sector, to see if there’s enough differentiation for pair trading. As a former PM, I would avoid pair trading a group when correlations were high. This group of processors is sitting right on the mean. The chart shows a 60 day rolling window, the function takes days as a parameter.

SQ starting to trade less like a bitcoin stock and more like a processor

Codebase: https://github.com/jed-gore/stock_betas/blob/main/README.md I recall a couple years ago I was looking at SQ and how much the stock traded with bitcoin. Apparently they clear bitcoin trades? Page 11 of the 2022 10K: “Customers can also use Cash App to invest their funds in U.S. listed stocks and exchange-traded funds (“ETFs”) or buy and sell bitcoin” Now that bitcoin has pulled back a bit (?) I was curious what SQ (Block)