notes, thoughts, and practice of applied machine learning
1k followers 0 articles/week
SVDSmoothing LLM Layers with WeightWatcher

Recently, Microsoft Research published the LASER method: ”Layer-Selective Rank Reduction” in this recent, very popular paper The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction And it got a lot of press (the Verge ) because it hints that it may be possible to improve the truthfulness of LLMs...

Tue Feb 13, 2024 11:26
Evaluating LLMs with WeightWatcher Part III: The Magic of Mistral, a Story of Dragon Kings

Recently, the Mistral models have taken the LLM world by storm. The Mistral Mixture of Experts (MOE) 8x7b model outperforms other models in it’s weight class such as LLamA 2 70B and GPT 3.5. Here’s a quick review of it’s performance on different LLM benchmarks: And even the smaller Mistal 7b model seems to be “punching well above its weight...

Tue Jan 30, 2024 11:19
Evaluating Fine-Tuned LLMs with WeightWatcher Part II: PEFT / LoRa Models

Evaluating LLMs is hard. Especially when you don’t have a lot of test data. In the last post, we saw how to evaluate fine-tuned LLMs using the open-source weightwatcher tool. Specifically, we looked at models after the ‘deltas’ (or updates) have been merged into the base model. In this post, we will look at LLMs fine-tuned using Parameter Efficient...

Sun Jan 28, 2024 11:13
Evaluating Fine-Tuned LLMs with WeightWatcher

if you are fine-tuning your own LLMs, you need a way to evaluate them. And while there are over a dozen popular methods to choose from, each of them are biased toward a specific, narrowly scoped measure. none of them can identify potential internal problems in your model, and in the end, you will probably need to design a custom...

Wed Jan 24, 2024 11:09
WeightWatcher new feature: fix_fingers=’clip_xmax’

WeightWatcher 0.7 has just been released, and it includes the new and improved advanced feature for analyzing Deep Neural Networks (DNN) called fix_fingers. To activate this, simply use: details = watcher.analyze(..., fix_fingers='clip_xmax', ...) This will take a tiny bit longer, and will yield more reliable alpha for your model layers,...

Wed Mar 22, 2023 00:07
WeightWatcher 0.7: March 2023

First, let me say thanks to all the users in our great community — we have reached over 93K downloads as of March 2023 ! The latest release of the open-source weightwatcher tool includes several important advances, including removing explicit dependence on tensorflow and torch on install the ability to process...

Tue Mar 21, 2023 03:04

Build your own newsfeed

Ready to give it a go?
Start a 14-day trial, no credit card required.

Create account