Robustly improving LLM fairness in realistic settings via interpretability