ergch24
ergch24

Reputation: 45

Understanding ‘backward()’: How to code the Pytorch function ‘.backward()’ from scratch?

I’m a newbie learning Deep Learning, I’m stuck trying to understand what ‘.backward()’ from Pytorch does since it does pretty much most the work there. Therefor, I’m trying to understand what backward function does in detail so, I’m going to try to code what the function does step by step. Any resource that you can recommend me (Book, video, GitHub repo) to start coding the function? Thank you for time and hopefully for your help.

Upvotes: 3

Views: 630

Answers (1)

prosti
prosti

Reputation: 46291

backward() is calculating the gradients with respect to (w.r.t.) graph leaves. grad() function is more general it can calculate the gradients w.r.t. any inputs (leaves included).

I implemented the grad() function, some time ago, you may check this. It uses the power of Automatic Differentiation (AD).

import math
class ADNumber:
    
    def __init__(self,val, name=""):
        self.name=name
        self._val=val
        self._children=[]         
        
    def __truediv__(self,other):
        new = ADNumber(self._val / other._val, name=f"{self.name}/{other.name}")
        self._children.append((1.0/other._val,new))
        other._children.append((-self._val/other._val**2,new)) # first derivation of 1/x is -1/x^2
        return new

    def __mul__(self,other):
        new = ADNumber(self._val*other._val, name=f"{self.name}*{other.name}")
        self._children.append((other._val,new))
        other._children.append((self._val,new))
        return new

    def __add__(self,other):
        if isinstance(other, (int, float)):
            other = ADNumber(other, str(other))
        new = ADNumber(self._val+other._val, name=f"{self.name}+{other.name}")
        self._children.append((1.0,new))
        other._children.append((1.0,new))
        return new

    def __sub__(self,other):
        new = ADNumber(self._val-other._val, name=f"{self.name}-{other.name}")
        self._children.append((1.0,new))
        other._children.append((-1.0,new))
        return new
    
            
    @staticmethod
    def exp(self):
        new = ADNumber(math.exp(self._val), name=f"exp({self.name})")
        self._children.append((self._val,new))
        return new

    @staticmethod
    def sin(self):
        new = ADNumber(math.sin(self._val), name=f"sin({self.name})")      
        self._children.append((math.cos(self._val),new)) # first derivative is cos
        return new
    
    def grad(self,other):
        if self==other:            
            return 1.0
        else:
            result=0.0
            for child in other._children:                 
                result+=child[0]*self.grad(child[1])                
            return result
        
A = ADNumber # shortcuts
sin = A.sin
exp = A.exp

def print_childs(f, wrt): # with respect to
    for e in f._children:
        print("child:", wrt, "->" , e[1].name, "grad: ", e[0])
        print_child(e[1], e[1].name)
        
    
x1 = A(1.5, name="x1")
x2 = A(0.5, name="x2")
f=(sin(x2)+1)/(x2+exp(x1))+x1*x2

print_childs(x2,"x2")
print("\ncalculated gradient for the function f with respect to x2:", f.grad(x2))

Out:

child: x2 -> sin(x2) grad:  0.8775825618903728
child: sin(x2) -> sin(x2)+1 grad:  1.0
child: sin(x2)+1 -> sin(x2)+1/x2+exp(x1) grad:  0.20073512936690338
child: sin(x2)+1/x2+exp(x1) -> sin(x2)+1/x2+exp(x1)+x1*x2 grad:  1.0
child: x2 -> x2+exp(x1) grad:  1.0
child: x2+exp(x1) -> sin(x2)+1/x2+exp(x1) grad:  -0.05961284871202578
child: sin(x2)+1/x2+exp(x1) -> sin(x2)+1/x2+exp(x1)+x1*x2 grad:  1.0
child: x2 -> x1*x2 grad:  1.5
child: x1*x2 -> sin(x2)+1/x2+exp(x1)+x1*x2 grad:  1.0

calculated gradient for the function f with respect to x2: 1.6165488003791766

Upvotes: 2

Related Questions