A substantial body of empirical work documents the lack of robustness in deep learning models to adversarial examples. Recent theoretical work proved that adversarial examples are ubiquitous in two-layers networks with sub-exponential width and ReLU or smooth activations, and multi-layer ReLU networks with sub-exponential width. We present a result of the same type, with no restriction on width and for general locally Lipschitz continuous activations.
More precisely, given a neural network f(⋅;θ) with random weights θ, and feature vector x, we show that an adversarial example x′ can be found with high probability along the direction of the gradient ∇xf(x;θ). Our proof is based on a Gaussian conditioning technique. Instead of proving that f is approximately linear in a neighborhood of x, we characterize the joint distribution of f(x;θ) and f(x′;θ) for x′=x−s(x)∇xf(x;θ).