cherry.optim

cherry.optim.Distributed

[Source]

Description

Synchronizes the gradients of a model across replicas.

At every step, Distributed averages the gradient across all replicas before calling the wrapped optimizer. The sync parameters determines how frequently the parameters are synchronized between replicas, to minimize numerical divergences. This is done by calling the sync_parameters() method. If sync is None, this never happens except upon initialization of the class.

References
  1. Zinkevich et al. 2010. “Parallelized Stochastic Gradient Descent.”
Example
opt = optim.Adam(model.parameters())
opt = Distributed(model.parameters(), opt, sync=1)

opt.step()
opt.sync_parameters()

__init__(self, params, opt, sync = None) special

Arguments
  • params (iterable) - Iterable of parameters.
  • opt (Optimizer) - The optimizer to wrap and synchronize.
  • sync (int, optional, default=None) - Parameter synchronization frequency.

sync_parameters(self, root = 0)

Description

Broadcasts all parameters of root to all other replicas.

Arguments
  • root (int, optional, default=0) - Rank of root replica.