**CS 638: Large Language Models in Industry** Kaiser Pister A UW Madison Computer Science seminar course exploring recent advances in natural language processing and their application space in industry. *This course is only available to senior undergraduates.* [Piazza link](https://piazza.com/wisc/fall2023/cs638) About ============================================================== 1. This is the second time this course is offered. The course is subject to change. Given the current tumultuous nature of NLP, especially in industry, that means anything I suggest now (in January ’24) might be irrelevant by February. 2. The course will follow what I view as relevant topics in NLP in industry. My background is as a founder and consultant using NLP at startup companies. As such, we will focus on business applications of these topics. 3. The course will be taught using Python. 4. The course will focus on the idea of “In Context Learning” and “Prompt Engineering” to the extent that they are scientific methods. We will answer questions such as “Does the order of exemplars in a prompt matter?”, “What tokens drastically change model output?” 5. You will not need a GPU, you will not be provided with a GPU. However if you have a GPU with at least 6GB of RAM, you might have more opportunity to experiment with the projects locally. 6. You will not need to purchase OpenAI or similar credits. We will use Free Tier or local projects only. However, all the work we do will be applicable to paid services and might be interesting to run on those platforms if you already have accounts and are undeterred by the finances. 7. This is a one unit course because I will give one lecture per week. The lecture will be highly interactive and require participation from all students. 8. The syllabus is under construction, but grading will likely be 50% participation, 50% project. 9. Your project will be to run an experiment from a paper on a field discussed. It will be a group project. 10. I expect this to be a work intensive course. You will spend time building so that you understand not only the theory, but the application of the theory. 11. Capacity is limited but I do not control it. 12. Background required is low. Work ethic required is high. Office Hours ============================================================== Room: CS 4224 Days: Monday, Wednesday Times: 12:30ish - 3:30ish Excluding our seminar time slot. Sometimes I arrive 30 minutes earlier. Syllabus ============================================================== Wednesday Jan 24, 2024: Introduction - [slides:intro](slides/0-intro.pdf), [slides:oogabooga](slides/0-oogabooga.pdf) Wednesday Jan 31, 2024: Data and Tasks - [train_gpt](code/1-train_gpt.ipynb) - [train_llama](code/1-train_llama.ipynb) - [reading: A Fire Upon the Deep](/files/futd.pdf) Wednesday Feb 7, 2024: Building GPT-2 from scratch - [slides](slides/1-datasets.pdf) - [using_bert](code/1-using_bert.ipynb) - [using_gpt](code/1-using_gpt.ipynb) Wednesday Feb 14, 2024: Guest Lecturer: My AI Assistant Wednesday Feb 21, 2024: Chain of Thought & EchoPrompt - [slides](slides/2-foundations.pdf), [attention](code/x-attention.ipynb), [tokenize](code/x-tokenize.ipynb), [normalize](code/x-normalize.ipynb) - [reading: GPT From Scratch](https://jaykmody.com/blog/gpt-from-scratch/) Wednesday Feb 28 - [reading: Chain of Thought](https://arxiv.org/abs/2201.11903) - [reading: EchoPrompt](https://arxiv.org/abs/2309.10687) Wednesday Feb 28, 2024: Datasets - [reading: Cargo Cult Science](https://calteches.library.caltech.edu/51/2/CargoCult.htm) - [reading: RedPajama](https://github.com/togethercomputer/RedPajama-Data) - [reading: Task Contamination](https://arxiv.org/abs/2312.16337) - [reading: Benchmark Contamination](https://arxiv.org/abs/2311.04850)