Random forests work by generating (typically hundreds) of decision trees in a specific random way such that each is de-correlated with the others. Since each decision tree is a low-bias, high-variance estimator, and each is relatively uncorrelated with the others, when we aggregate their predictions we get a final prediction with low bias AND low variance. Magic. The trick is in getting trees trained on the same dataset to be uncorrelated. This is accomplished by using randomly sampled subsets of features for evaluation at each node in each tree and a randomly sampled subset (bootstrap) of data points to train each tree.
Put simply, if you have a machine learning problem and you don’t know what to use, you should use random forests. Here, in table form (courtesy of Hastie, Tibshirani and Friedman), is why:
Random forests inherit most of the good attributes of "Trees" in the above chart, but in addition also have state-of-the-art predictive power. Their main drawbacks are a lack of good interpretability, something that most other highly predictive algorithms do even worse on; and computational performance -- if you need something for real-time production, it could be hard to justify using random forests and spending the time to evaluate hundreds or thousands of trees.
If you are interested in playing around, grab the R package.
I recently heard the president of Kaggle, Jeremy Howard, mention that Random Forests seem to show up in a disproportionate number of winning entries in their data mining competitions. Cross-validation, I call that.
A Comparison of Decision Tree Ensemble Creation Techniques
An Empirical Comparison of Supervised Learning Algorithms