![]() In this stats.stackexchange post you can find that gradient = (p-y) and hessian = p*(1-p). So, in order to minimize the log loss objective function we need to find its 1st and 2nd derivatives (gradient and hessian) with respect to x. The output x of the model is the sum across the the CART tree learners. Note that p (score or pseudo-probability) is calculated after applying the famous sigmoid function into the output of the GBT model x. ,where y is the real label in and p is the probability score. Let’s take the case of binary classification and log loss objective function:īinary classification with Cross Entropy loss function Binary classification with log loss optimization The above algorithm is called the “ Exact Greedy Algorithm” and its complexity is O(n*m) where n is the number of training samples and m is the features dimension.
0 Comments
Leave a Reply. |