# Notes on “Solve Many-body Physics Problem with Artificial Neural Networks”

G. Carleo and M. Tyoer, Solving the quantum many-body problem with
artificial neural networks, Science, Vol. 355, Issue 6325, pp. 602-606,

[First appear on arXiV as arXiv:1606.02318]

The challenge posed by the many-body problem in quantum physics
originates from the
difficulty of describing the nontrivial correlations encoded in the
exponential complexity
of the many-body wave function. Here we demonstrate that systematic
machine learning of
the wave function can reduce this complexity to a tractable
computational form for some
notable cases of physical interest. We introduce a variational
representation of quantum
states based on artificial neural networks with a variable number of
hidden neurons.
A reinforcement-learning scheme we demonstrate is capable of both
finding the ground
state and describing the unitary time evolution of complex interacting
quantum systems.
Our approach achieves high accuracy in describing prototypical
interacting spins models in
one and two dimensions.

## Introduction

M. Hush, Machine learning for quantum physics, Science 355 (6325),
580-580, 2017.

Due to vast number of complex numbers required to save a complete wave
function, the simulation of many-body systems has been an immense
challenge in quantum science. For example, consider a many-body system
composed of N qubits, the simplest quantum bodies. We must save a
complex number for every configuration of this system. Each qubit can
be in a state of 0 or 1, so that means we need to save 2N complex
numbers. Even a small number of qubits requires an extreme amount of
memory. For example, 26 qubits require around a gigabyte, 46 qubits
require a petabyte, and 300 qubits would require more bytes than the
number of atoms in the universe. Richard Feynman recognized this
problem, which led to his suggestion that quantum computers may have
an advantage over conventional computers [although we now know that
the reason behind the quantum speedup is more nuanced].

An artificial neural network approximates the wave function of a
system composed of 4 qubits. The neural network takes a configuration
of the system as an input that is multiplied by a matrix of weights,
Wi,j, added to a set of hidden biases, hj, and passed through a
nonlinear activation function to produce a complex number, C, as an
output. The neural network learns what the ground state (or dynamics)
of the system is. Increasing the number of hidden biases can improve
the accuracy.

This literature report is by no means a complete reflection of the
paper studied.

The wave function $\Psi$ is a fundamental object in quantum physics and
possibly the hardest to grasp. $\Psi$ is a monolithic mathematical
property that contains all of the information on a given quantum state.
In principle, an exponential amount of information is needed to fully
encode a generic many-body quantum state (see quotations above). A
limited amount of quantum entanglement (a physical phenomenon that
occurs when pairs or groups of particles are generated or interact in
ways such that the quantum state of each particle cannot be described
independently of the others, even when the particles are separated by a
large distance—instead, a quantum state must be described for the system
as a whole
) and a small number of physical states in such systems
enable modern approaches to solve many-body Schrodinger equation with a
limited amount of classical resources.

Despite notable success of exploring ways to solve this fundamental
many-body problem, there are still large number of unexplored regimes
and difficulties in finding a general strategy to reduce the exponential
complexity of the full many-body wave function down to its most
essential features (in this paper, spin configuration).

Artificial neural network has already been at a prominent role. It can
adapt itself to describe and analyze such a quantum system (The
challenging goal of solving many-body problem without prior knowledge of
exact samples is nonetheless still unexplored
Indeed, wave function of
a certain quantum entity is hard to be expressed fully). The
self-learning feature of machine learning could open ways to solve
quantum many-body problem in regimes that have been inaccessible to
existing numerical approaches.

Kagome QSL示意图。作图：冯子力。

## Limited Intro to Machine Learning and Neural Networks

Definition: “a machine that can think.” Machine learning is a
computational technique dedicated to achieve self-learning and
self-improvement with the help of known experience $E$ (or limited
incomplete knowledge).

A computer program is said to learn from experience E with respect to
some class of tasks T and performance measure P if its performance at
tasks in T, as measured by P, improves with experience E. (Formal
definition by Tom M. Mitchell)

Computer programs with the ability to learn without being explicitly
programmed (A. Samuel, 1959)

Very simple neural network: Machine learning is a very wide field.
It has many branches and involves many different algorithms. In this
report neural network is focused.

Artificial neural networks (ANNs) or connectionist systems are a
computational model used in machine learning, computer science and
other research disciplines, which is based on a large collection of
connected simple units called artificial neurons, loosely analogous to
axons in a biological brain.

Simple neural network to solve “Exlusive or” (XOR) problem (XOR: a
logical operation that outputs true only when inputs differ (one is
true, the other is false).

Figure 0 Three-layer neural network to solve (learn) XOR problem.

Normally in neural network, we have an activation function f(M)
(normally sigmoid function or step function). M is the function
input
defined as M= $\sum_{i}w_{i}x{i}-\theta$, where $w_{i}$ and
$\theta$ are called weight and threshold, respectively.

Boltzmann machine: an energy-based model. Visible layer is the input
and hidden layer represents the intrinsic behavior of individual visible
node. (The objective of Boltzmann machine is to reduce the energy
function.
)

It gets the name because the probability of a certain state to appear in
Boltzmann machine is expressed in a way analogous to Boltzmann
distribution in statistics mechanics
($P(s)=e{-E(s)}/\\sum\_{t}e{-E(t)}$)

K左右存在反铁磁相变。

## Literature Mapping

### Machine Learning and Many-body Physics and More …

Use machine learning to solve physical problems:

• Lei Wang, (IOP, CAS), Discovering phase transitions with
unsupervised learning, PHYSICAL REVIEW B 94, 195105 (2016)

namely, phase transition based on Ising model.

• Dong-Ling Deng, Xiaopeng Li, and S. Das Sarma (University of
Maryland), Exact Machine Learning Topological States,
arXiv:1609.09060v1

representing quantum topological states with long-range quantum
entanglement using artificial neural networks.

direct derivative of the paper reported in Science.

• 科学，Dong-Ling Deng, Xiaopeng Li, and S. Das Sarma (University of
Maryland), Quantum Entanglement in Neural Network States,
arXiv:1701.04844v2

• Li Huang and Lei Wan (IOP, CAS), Accelerated Monte Carlo simulations
with restricted Boltzmann machines, Phys. Rev. B 95, 035105 (2017).

## Methods Summary

### Establish the Network

Finding the ground state of a quantum system is never a easy task. A
quantum system involves many degrees of freedom
$S=(S_{1},S_{2},S_{3},…,S_{N})$, which may be spins, Bosonic
occupation numbers, or similar. (Apparently, spatial coordinates and
time should also be considered as degrees of freedom.
) The many-body
wave function is a mapping of the N-dimensional set S to complex numbers
that fully specify the amplitude and the phase of the quantum state.

The basic idea of this neural network implementation is simple. Think
the wave function as a computational black box which, given an input
many-body configuration S, return a phase and an amplitude according to
$\Psi(S)$. The goal is to approximate this computational black box with
a neural network, trained to best represent $\Psi(S)$.

In this paper, attentions are concentrated on the description of
spin-1/2 quantum systems. In this case, restricted Boltzmann machine
(RBM) architectures are constituted by one visible layer of N nodes,
corresponding to the physical spin variables in a chosen basis (e.g.,
$S=\sigma{z}\_{1},…,\\sigma{z}_{N}$) (say if we focus on
spin-1/2 system, we only have two possible values of spin varible, “spin
up” and “spin down”
), and a single layer of M auxiliary spin
variables
($h_{1},…h_{M}$). This description corresponds to a
variational expression for the quantum states: $\Psi_{M} (S;W)=\sum_{h_{i}}e{\\sum\_{j}a\_{j}\\sigma{z}{j}+\sum{i}b_{i}h_{i}+\sum_{ij}W_{ij}h_{i}\sigma^{z}{j}}$,
where $h {i}={-1,1}$ is a set of M hidden spin variables and the network
parameters $W={a,b,W}$ fully specifies the response of the network to a
given input $S$ (here a and b are called biases, W is called connection
weights
).

Since intralayer is missing, the wave function can be rewritten to
include only visible variables (this is very helpful since we usually
need to deal with large number of hidden neurons, and also we simplified
the description of a complex RBM into a conventional neural network
where the visible layer and the hidden layer are clearly divided.
.).

Background

RBM and Its Modified Form Regarding Applications in Statistical
Physics

Fig.1 (a) The restricted Boltzmann machine is an energy-based model
for binary stochastic visible and hidden variables. Their probability
distribution follows the Boltzmann distribution with the energy
function defined. (Conventional definition of RBM) (b) Viewing the
RBM as a feed-forward neural network which maps the visible variables
to the free energy equation. Fitting this free energy function to the
log probability of the physical models determines the weights and
biases of the RBM.

RBM is conventionally defined as a classical statistic mechanics
system governed by the following energy function:
$E(x,h)=-\sum{N}\_{i=1}a\_{i}x\_{i}-\\sum{M}{j=1}b{j}h_{j}-\sum{N}\_{i=1}\\sum{M}{j=1}x{i}W_{ij}h_{j}$,
where the network consists of N visible nodes and M hidden nodes.
The joint probability distribution of the visible and hidden variables
follows the Boltzmann distribution: $p(x,h)=e^{-E(x,h)}/Z$, where the
partition function $Z$ is a normalization vector.

Feed-forward approach:

Similarly, we have trainable biases and connection weights a, b and W
of the RBM. The hidden layer neurons activate via the softplus
function $f(z)=ln(1+e^{z})$. The yellow output neuron (output of the
free energy function, free energy of visible variables
) sums up the
outputs of the hidden neurons and results coming directly from the
input neurons for the final result. The training is done by optimizing
the free energy function (viewed as a revised energy function adapted
to the problem investigated)
. This supervised training approach is
significantly simpler and more efficient compared to the conventional
unsupervised learning approach
. This should be an efficient way to
initialize all network parameters
.

Background

Training RBM with Contrastive Divergence (OR, blocked Gibbs sampling
approach)

Since intralayer interaction is missing, the following equations
hold,

$P(v|h)=\prod_{i=1}^{d}P(v_{i}|h)$

$P(h|v)=\prod_{j=1}^{q}P(h_{j}|v)$

if there is d visible nodes and q hidden nodes.

An example as following:
(conditional probability of hidden/visible variables under supplied
visible/hidden variables)

$p(h_{j}=1|x)=\sigma(b_{j}+\sum^{N}{i=1}x{i}W_{ij})$

$p(x_{i}=1|h)=\sigma(a_{i}+\sum^{M}{j=1}W{ij}h_{j})$

,where $\sigma$ is the sigmoid function.

Normally in a RBM, for each input set, we use this probability
function to calculate the probability density function of the hidden
neurons. Then use sampling methods to obtain actual hidden neuron
variables. Next, use these hidden variables as a starting point to
refresh the input set.

Figure 2 Two strategies of proposing updates using RBM. (a) the
blocked Gibbs sampling. (b) The blocked Gibbs sampling with an
additional Metropolis scheme (by judging the free energy of hidden
variables (similar to the approach we use in reinforcement
learning).
)

### The Algorithm

From the paper, it is known, the learning doesn’t have a good partner
to begin with (i.e. we don’t know the exact form of wave function)
.
Hence supervised learning of $\Psi$ is not a viable option, instead, a
reinforcement learning approach is developed in which either the
ground-state wave function or the time-dependent one is learned on the
basis of feedback from variational principles.

Algorithm 1: (Training of network parameters) Reinforcement
learning

(Not included in the supplementary code)

Suppose we have a known Hamiltonian expression $H$., then we have the
expectation value of $H$ written in terms of connection weights:
$E(W)=<\Psi_{M}|H|\Psi_{M}>/<\Psi_{M}|\Psi_{M}>$.
The task now is to minimize the expectation value through learning.

There are many approaches to reach the optimal network parameters.
Here’s the algorithm used in the paper (Stochastic Reconfiguration
(SR)
):

<div style=”background-color:rgba(0, 0, 0, 0.0470588);
text-align:left; vertical-align: left;”>
Objective: minimize the energy expectation value
$E(W)=<\Psi_{M}|H|\Psi_{M}>/<\Psi_{M}|\Psi_{M}>$

1 Initialize network parameters (W, a, b)

2 Sample $|\Psi(S,W_{k})|^{2}$ for current network parameters

3 Stochastic estimates of the average energy and energy gradient is
obtained.

4a Applying improved gradient descent optimization to obtain updated
network parameters.

4b The trial wave function is altered to adapt to the new network
parameters, back to Step 3.

5 Process concludes until energy convergence is reached.
</div>

Algorithm 2: Check the validity of the model through learning

(Included in the supplementary code)

<div style=”background-color:rgba(0, 0, 0, 0.0470588);
text-align:left; vertical-align: left;”>
Input: Trained network (const network parameters W, a, b and
Hamiltonian of a specific model, done in Algorithm 1), Trial spin input
$S$.

(The hidden layer values are now completely a “black box”, their
expression have been traced out due to the lack of intralayer
interactions.
)

Objective: Find the ground state spin configuration $S’$, and output
the ground state energy (as summation from all individual spins).

1 Initialize all input parameters.

2 A random spin is flipped.

3 Accept/Reject the change through a simple Metropolis-Hastings
algorithm
: Acc = min (1, $|\Psi(S{\*})/\\Psi(S)|{2}$

4 Back to step 2 until the steplimit (number of sweeps as defined in
this paper) is reached.

5 Generate output (spin configuration, energy, …)

</div>

## Results and Discussions

Zili Feng, Wei Yi, Kejia Zhu, Yuan Wei, Shanshan Miao, Jie Ma, Jianlin
Luo, Shiliang Li, Zi Yang Meng, Youguo Shi

### Time-invariant Models: Transverse Field Ising and Heisenberg (1D and 2D)

(Included in the supplementary code)

Transverse field Ising: spin configuration under the influence of an
external transverse field.

Hamiltonian expressed as:
$H_{TFI}=-h\sum_{i}\sigma{x}\_{i}-\\sum\_{ij}\\sigma\_{i}{z}\sigma_{j}^{z}$

Antiferromagnetic Heisenberg: used to study critical points and
phase transitions of magnetic systems.

Hamiltonian expressed as:
$H_{AFH}=\sum_{ij}\sigma_{i}{x}\\sigma\_{j}{x}+\sigma_{i}{y}\\sigma\_{j}{y}+\sigma_{i}{z}\\sigma\_{j}{z}$

Feature filter (dealing with translational symmetry) (the most
) (visualizing the ground state):

The most time-consuming part involved in the study is the SR
optimization and t-VMC (time-dependent variational Monte Carlo)
(NOT in
the supplementary code). (Note again: the main calculation will still be
a time-consuming task if the number of hidden nodes is large). Setting
up translational symmetry would greatly reduce the number of parameters
needed to be tuned.

NQS can be formulated in a way that conserves some specific symmetries.
In the spirit of shift-invariant RBMs, concretely, for integer hidden
variable density $\alpha = 1,2,…$, the weight matrix takes the form
of feature filters $W_{j}^{(f)}$ for $f \in [1,\alpha]$. These
filters have a total of $\alpha N$ variational elements in lieu of the
$\alpha N^{2}$ elements of the asymmetry case (or, if the symmetry
setting is missing).

Figure 3 (Figure 2 in the original paper) Neural-network presentation
of the many-body ground states. The horizontal color map shows the
values that the fth feature map $W_{j}^{(f)}$ takes on the jth lattice
site.

Let us now think the physical meaning of connecting weights $W_{ij}$.
They should be viewed as the correlation strength between different
spins (the spin in the visible layer and the assisting spin in the
hidden layer).
This computational trick is explained as: if no symmetry
is imposed (correlation value is randomly assigned), we have $\alpha N^{2}$ network weights to determine. However, if we designate certain
symmetry (in that case, the correlation strength is limited to only
$\alpha$ possibility), we only need to determine (or more explicit,
train) $\alpha N$ parameters.

Figure 3 therefore shows the trained pattern corresponds to the model
provided. In the rightmost panels, its shows the antiferromagneic
correlation.

Figure 4 Connection weights of RBM trained for the Falicov-Kimball
model. This is to show the complexity of a RBM and the possible
difficulty in obtaining this weight map.

Accuracy of energy: After the energy is obtained from RBM, it’s
reasonable to compare it to exact solution to see if there is any energy
difference.

Figure 5 (Figure 3 in the original paper) Accuracy of NQS vs. exact
solution/other established model/ansatz. Higher hidden layer density
result in better solution. Figure 5(a) also highlights the influence of
magnetic field strength, h=1 is the hardest-to-learn ground state.

Efficiency Assessment: The overall computational cost is comparable
to that of standard ground-state QMC simulations.

Wen硕士为那篇作品作了点评！

### Time-dependent Model: Unitary Dynamics

The NQS can also be trained to solve time-dependent many-body problems.
The network parameters are therefore trained in terms of time ($W(t)$)
to best reproduce quantum dynamics.

Training is done with the help of time-dependent VMC method.

To demonstrate the effectiveness of the NQS in the dynamical context, we
consider the unitary dynamics induced by quantum quenches in the
coupling constants of our spin models (parameters of the quantum
Hamiltonian and/or action are suddenly changed). In the TFI model, we
induce nontrivial quantum dynamics by means of an instantaneous change
in the transverse field: The system is initially prepared in the ground
state of the TFI model for the transverse field $h_{i}$ and then
evolves under the action of TFI Hamiltonian with a transverse field
$h_{f}\neq h_{i}$. In the AFH model, we study quantum quenches in the
longitudinal coupling $J_{z}$.

The high accuracy (see Figure 6) also obtained for the unitary dynamics
further confirms that neural-network-based approaches can be
successfully used to solve the quantum many-body problem, not only for
ground-state properties but also for modeling the evolution induced by a
complex set of excited quantum states.

Figure 6 (Figure 4 in the original paper) Many-body unitary time
evolution with NQS. (Time-dependent transverse spin polarization /
nearest-neighbor spin correlations in TFI/AFH compared to exact results.

## Conclusion and Outlook

• RBM is relatively to set up, and achieve comparable high accuracy by
comparing with exact solutions. Many paths for research can be
envisaged in the near future (the field of machine learning advances
very fast. e.g. multi-layer neural network like convolutional
neural networks
).

• The straightforward approach of NQS can be readily applied in othe
er systems (used to solve more challenging problems).

• NQS can be further modified to represent more compact
representations of many-body quantum states.

The Search for the Quantum Spin Liquid in Kagome Antiferromagnets

J.-J. Wen, Y. S. Lee

Stanford Institute for Materials and Energy Sciences, SLAC National
Accelerator Laboratory, 2575 Sand Hill Road, Menlo Park, CA 94025, USA

Department of Applied Physics, Stanford University, Stanford, CA 94305,
USA

A quantum spin liquid is an exotic quantum ground state that does not
break conventional symmetries and where the spins in the system remain
dynamic down to zero temperature. Unlike a trivial paramagnetic state,
it features long-range quantum entanglement and supports fractionalized
excitations. Since Anderson’s seminal proposal in 1973, QSLs have been
vigorously studied both theoretically and experimentally. Frustrated
magnets have been the most fruitful playground for the QSL research.
These are materials with competing exchange interactions, which
typically arise from triangle-based lattices, leading to macroscopic
classical ground state degeneracy. This type of frustration is a key
ingredient in discovering quantum disordered ground states.

The spin-1/2 Heisenberg model on the kagome lattice, a two-dimensional
lattice formed by corner sharing triangles, is an intensively studied
frustrated model. From early on it was recognized that the ground state
of the nearest neighbor kagome lattice antiferromagnet is a non-magnetic
state, although it is not clear whether it is a QSL or a
valence-bond-solid which breaks the translation symmetry. Recent
state-of-the-art numerical studies have converged on the ground state
being a QSL, yet the nature of the QSL remains an open question with
evidences for both a gappedZ2QSL and a gaplessU QSL.

Experiments on kagome lattice antiferromagnets are equally challenging.
One of the difficulties arises from the rarity of magnetic materials
that contain perfect kagome lattices. The situation changed when the
successful synthesis of herbertsmithite was reported in 2005.
Herbertsmithite is the full Zn end member of Zn-paratacamite with
general chemical formula ZnxCu4?x6Cl2, where perfect kagome layers of
spin-1/2 Cu2+are separated from each other by the non-magnetic Zn
layers. Since then, extensive characterization has been carried out on
herbertsmithite, and all signs point to a quantum disordered ground
state consistent with a QSL. In particular, inelastic neutron scattering
measurements on herbertsmithite single crystals revealed a continuum of
magnetic excitations that is characteristic of the fractionalized
spinons. Analogous to the situation in theoretical studies, however, it
has been difficult to resolve whether or not the putative QSL is gapped.
This is due to the complexity that even in the best herbertsmithite
single crystal synthesized so far, the Zn substitution is not perfect:
while the kagome layers remain fully occupied by Cu, ?15% of the Zn
sites are occupied by Cu. These “impurity” spins are expected to be
weakly interacting and contribute mainly to the low energy magnetic
response, and therefore hinder the direct probe of the intrinsic gap
size of the kagome layer spins. Only recently has careful analysis of
NMR and inelastic neutron scattering measurements that took into account
of the impurity spin contributions found evidence of a gapped QSL in
herbertsmithite. The possibility of a gapped QSL is further supported by
recent NMR work on the kagome QSL candidate Zn-barlowite.

The discovery of a new kagome QSL candidate material Zn-claringbullite
Cu3Zn6FCl] brings an interesting new addition to the field. Like
herbertsmithite, Zn-claringbullite contains well-separated perfect
kagome layers, which makes it an ideal platform to explore the kagome
QSL. Because of the different coordination environment of the Zn ion,
which is trigonal prismatic compared to octahedral in herbertsmithite,
the kagome layers in Zn-claringbullite are stacked in an AA pattern
instead of ABC stacking, which is similar to Zn-barlowite. In fact, the
physical properties of the claringbullite family appear to be rather
similar to the barlowite family.

The absence of a magnetic transition in Zn-claringbullite is a promising
indication of a QSL. This is the tip of the iceberg, and continued
studies would further illuminate the novel magnetic ground state in
Zn-claringbullite, such as resolving the extent of Zn substitution into
the kagome layer Cu sites, probing the effects of the impurity Cu spins
that sit on the Zn sites, and ultimately determining whether the ground
state is gapped. It would also be interesting to see if sizable single
crystals of Zn-claringbullite can be grown to facilitate more detailed
experimental studies such as inelastic neutron scattering.

With the discovery of new and promising kagome QSL candidate materials,
we can expect more clues will be uncovered in the near future to help
resolve the long-standing kagome antiferromagnet problem. This important
experimental work will also provide new insights regarding topological
order and quantum entanglement as manifested in quantum spin liquids in
real materials.

{“type”:2,”value”:”