Kaggle is the world’s largest community of data scientists. They compete with each other to solve complex data science problems, and the top competitors are invited to work on the most interesting and sensitive business problems from some of the world’s biggest companies through Masters competitions.
Kaggle provides cutting-edge data science results to companies of all sizes. We have a proven track-record of solving real-world problems across a diverse array of industries including life sciences, financial services, energy, information technology, and retail.
The confidence to go forward and compete for some serious $$$$$$
.
The historical data has been split into two groups, a ‘training set’ and a ‘test set’. For the training set, we provide the outcome ( ‘ground truth’ ) for each passenger. You will use this set to build your model to generate predictions for the test set.
For each passenger in the test set, you must predict whether or not they survived the sinking ( 0 for deceased, 1 for survived ). Your score is the percentage of passengers you correctly predict.
The Kaggle leaderboard has a public and private component. 50% of your predictions for the test set have been randomly assigned to the public leaderboard ( the same 50% for all users ). Your score on this public portion is what will appear on the leaderboard. At the end of the contest, we will reveal your score on the private 50% of the data, which will determine the final winner. This method prevents users from ‘overfitting’ to the leaderboard.
Tutorials
Let \((x1, x2, …, xn)\) be an independent and identically distributed sample drawn from some distribution with an unknown density \(ƒ\). We are interested in estimating the shape of this function \(ƒ\). Its kernel density estimator is
\[\hat{f}_h(x) = \frac{1}{n}\sum_{i=1}^n K_h (x - x_i) = \frac{1}{nh} \sum_{i=1}^n K\Big(\frac{x-x_i}{h}\Big), \]
where \(K(•)\) is the kernel — a non-negative function that integrates to one and has mean zero — and h > 0 is a smoothing parameter called the bandwidth. A kernel with subscript h is called the scaled kernel and defined as \(Kh(x) = 1/h K(x/h)\). Intuitively one wants to choose h as small as the data allow, however there is always a trade-off between the bias of the estimator and its variance; more on the choice of bandwidth below.
refs and see also