Using ChatGPT to generate substitute data for language model classifier training
Code base here: https://github.com/jed-gore/chatgpt_management_transcripts PROBLEM Back in 2018 – 2019 I worked on training Machine Learning models to predict which sentences in company conference call transcripts were likely to be “guidance” or forward looking statements. The most challenging part of the project was editing Excel files with 50,000 lines of text and flagging each sentence as “guidance” or “not guidance”. This was done to have a training set that could