THIS POST IS CONTINUED FROM PART 3, BELOW--
In machine learning, bias is a mathematical property of an algorithm.
The counterpart to bias in this context is variance.
Algorithms with high bias tend to be rigid. As a result they can miss underlying complexities in the data they consume. In the fields of science and engineering, bias referred to as precision.
Bias is a term data scientists use to describe a particular mathematical property of the algorithm that influences its prediction performance.
Bias is generally coupled with variance, another algorithm property. Bias and variance interact, and data scientists typically seek a balance between the two.
Bias arises when we generalize relationships using a function, while variance arises when there are multiple samples or input.
Bias is the algorithm’s tendency to consistently learn the wrong thing by not taking into account all the information in the data - this is underfitting.
When a mode is built using so many predictors that it captures noise along with the underlying pattern then it tries to fit the model too closely to the training data leaving very less scope for generalizability. This phenomenon is known as overfitting.
Bias of an estimator is the “expected” difference between its estimates and the true values in the data. Intuitively, it is a measure of how “close”(or far) is the estimator to the actual data points which the estimator is trying to estimate. A model that generalizes well is a model that is neither underfit nor overfit.
Bias and variance are two errors in the total error in the learning algorithm, if you try to reduce one error, the other error might go up. When the learning algorithm has the high bias problem, working on reducing the bias will cause the variance to go up, causing over-fitting problem. And, when the learning algorithm is suffering from the high variance problem, working on reducing the variance will cause the bias to go up, causing under-fitting problem.
Bias leads to a phenomenon called underfitting. This is caused by the introduction of error due to the oversimplification of the model. On the contrary, variance occurs due to complexity in the machine learning algorithm. ... This is known as bias-variance tradeoff.
Reducing just the bias, will not improve the model, and vice versa. The ‘sweet spot’ is to land the data points at a place where there is optimum bias and optimum variance. Basically, find a pattern by not taking any of the extremes such that it tampers accuracy
We have to avoid overfitting because it gives too much predictive power to even noise elements in our training data. But in our attempt to reduce overfitting we can also begin to underfit, ignoring important features in our training data.
Bias is an error in the learning algorithm, when the learning algorithm is weak to learn from the data. In case of high bias, the learning algorithm is unable to learn relevant details in the data. Hence, it performs poor on the training data as well as on the test dataset.
Bias is the accuracy of our predictions. A high bias means the prediction will be inaccurate. A model is said to have high bias when its structure does not describe the data model.
A linear model will always have low performance if it is used in a non-linear data set, no matter how much data is used, the model will always have low performance. Bias is the error that occurs when trying to approximate the behavior of a problem’s data.
The goal of any supervised machine learning algorithm is to achieve low bias and low variance. In turn the algorithm should achieve good prediction performance.
Bias is used for making the learning of the target function of any model easier. This is done by making simplifying assumptions. These simplifying assumptions are termed bias. It is basically the difference between the model’s average prediction and the actual correct value which is being attempted to be predicted
When there are fewer assumptions made about the target function form, it indicates low bias. When there are more assumptions made about the target function form, it indicates a higher bias.
Algorithm bias is associated with rigidity. High bias, can cause an algorithm to adhere so strongly to rules that it misses complexities in the data..
A model has a low bias if predicts well on the training data..
When a model has a high bias it means that it is very simple and that adding more features should improve it. Bias is the contribution to total error from the simplifying assumptions built into the method
Bias error is the difference between the predicted data points and the actual data points which was caused because our model was oversimplified. A model with high bias is too simple and has low number of predictors.
It is missing some other important predictors due to which it is unable to capture the underlying pattern of data. It misses how the features in the training data set relate to the expected output. It pays very little attention to the training data and oversimplifies the model. This leads to high error on training and test data.
On the other hand, models with high bias are more rigid, less sensitive to variations in data and noise, and prone to missing complexities. Importantly, data scientists are trained to arrive at an appropriate balance between these two properties.
The bias is an error from erroneous assumptions in the learning algorithm.
When a machine is biased, it is unable or less able to adapt to various training models, preferring one route as a primary mechanism. This means the developed AI algorithm is rigid and inflexible, unable to adjust when a variation is created in the data at hand. It is also unable to pick up on discreet complexities that define a particular data set.
Sources of bias-- There are two key ways bias can be introduced and amplified during the machine learning process: by using non-representative data and while fitting and training models.
Machine learning algorithms themselves may amplify bias if they make predictions that are more skewed than the training data. Such amplification often occurs through two mechanisms: 1) incentives to predict observations as belonging to the majority group and 2) runaway feedback loops.
Bias is used to allow the Machine Learning Model to learn in a simplified manner. Ideally, the simplest model that is able to learn the entire dataset and predict correctly on it is the best model. Hence, bias is introduced into the model in the view of achieving the simplest model possible.
In model building, it is imperative to have the knowledge to detect if the model is suffering from high bias or high variance. The methods to detect high bias and variance is given below:
Detection of High Bias:--
The model suffers from a very High Training Error.
The Validation error is similar in magnitude to the training error.
The model is underfitting.
Detection of High Variance:--
The model suffers from a very Low Training Error.
The Validation error is very high when compared to the training error.
The model is overfitting.
Bias is used for making the learning of the target function of any model easier. This is done by making simplifying assumptions.
These simplifying assumptions are termed bias. It is basically the difference between the model’s average prediction and the actual correct value which is being attempted to be predicted by us.
Models with high bias are more rigid, less sensitive to variations in data and noise, and prone to missing complexities..
A model with high bias is too simple and has low number of predictors. It is missing some other important predictors due to which it is unable to capture the underlying pattern of data. It misses how the features in the training data set relate to the expected output.
It pays very little attention to the training data and oversimplifies the model. This leads to high error on training and test data.
When the parametric algorithms are taken into consideration, they have a high bias which makes them very easy and fast to learn and understand.
But, these are less flexible as they fail to give a high predictive performance when complex problems are being considered.
When there are fewer assumptions made about the target function form, it indicates low bias. When there are more assumptions made about the target function form, it indicates a higher bias.
Examples of low-bias machine learning algorithms include: Decision Trees, k-Nearest Neighbors and Support Vector Machines.
Examples of high-bias machine learning algorithms include: Linear Regression, Linear Discriminant Analysis and Logistic Regression.
Linear machine learning algorithms often have a high bias but a low variance.
Nonlinear machine learning algorithms often have a low bias but a high variance.
Bias are the simplifying assumptions made by a model to make the target function easier to learn.
Generally, linear algorithms have a high bias making them fast to learn and easier to understand but generally less flexible. In turn, they have lower predictive performance on complex problems that fail to meet the simplifying assumptions of the algorithms bias.
Increasing the bias will decrease the variance.
Increasing the variance will decrease the bias.
When we are talking about errors, we can find reducible and irreducible errors.
Irreducible errors are errors that cannot be reduced no matter what algorithm you apply. They are usually known as noise and, the can appear in our models due to multiple factors like an unknown variable, incomplete characteristics or a wrongly defined problem. It is important to mention that, no matter how good is our model, our data will always have some noise component or irreducible errors we can never remove.
The irreducible error cannot be reduced regardless of what algorithm is used. It is the error introduced from the chosen framing of the problem and may be caused by factors like unknown variables that influence the mapping of the input variables to the output variable.
The prediction error for any machine learning algorithm can be broken down into three parts:--
Bias Error
Variance Error
Irreducible Error
Total Error = Bias + Variance + Irreducible Error
Irreducible error cannot be avoided.
The parameterization of machine learning algorithms is often a battle to balance out bias and variance.
Even for an ideal model, it is impossible to get rid of all the types of errors. The “irreducible” error rate is caused by the presence of noise in the data and hence is not removable. However, the Bias and Variance errors can be reduced to a minimum and hence, the total error can also be reduced significantly.
Reducible errors have two components – bias and variance. This kind of errors derivate from the algorithm selection and the presence of bias or variance causes overfitting or underfitting of data.
Low bias offers more flexibility. As examples, we have decision trees, k-nearest neighbour (KNN) and vector support machines.
High Bias can be identified when we have:--
High training error.
Validation error or test error is the same as training error.
High Variance can be identified when:--
Low training error.
High validation error or high test error.
High bias is due to a simple model and we also see a high training error. To fix that we can do the following things:--
Add more input features.
Add more complexity by introducing polynomial features.
Decrease Regularization term.
One way to reduce the error is to reduce the bias and the variance terms. However, we cannot reduce both terms simultaneously, since reducing one term leads to increase in the other term. This is the idea of bias variance trade/off.
Ideally, you must find a model at the sweet spot between overfitting and underfitting. In other words, the model with a complexity where the curves of variance and bias intersect
In general, if we want to increase the complexity of a function we need to add more parameters to this function. In the case of neural networks, our parameters are the weights and biases. To add additional weights and biases we just need to increase the number of layers and the number of neurons in the neural network.
When the parametric algorithms are taken into consideration, they have a high bias which makes them very easy and fast to learn and understand.
But, these are less flexible as they fail to give a high predictive performance when complex problems are being considered.
In the case of the predictive model, the main focus is not on reducing the bias to the maximum extent. A model with slightly more bias error is acceptable as long as the test set error is minimized considerably.
Boosting is a meta-learning algorithm that reduces both bias and variance. ... The model based on boosting tries to reduce the error in predictions by, for example, focusing on poor predictions and trying to model them better in the next iteration, and hence reduces bias
A model has a high variance if it predicts very well on the training data but performs poorly on the test data
When the model performs well on the Training Set and fails to perform on the Testing Set, the model is said to have Variance.
Variance is an error in the learning algorithm, when the learning algorithm tries to over-learn from the dataset or tries to fit the training data as closely as possible. In case of high variance, the algorithm performs poor on the test dataset, but performs pretty well on the training dataset.
High variance error of a model implies that it is highly sensitive to small fluctuations. This model flounders outside of its comfort zone (training data)..
As a general rule, the more flexible a model is, the higher its variance and the lower its bias. The less flexible a model is, the lower its variance and the higher its bias.
High variance, can cause an algorithm to pay too much attention to data points that might actually be noise.
A model with high variance is likely to produce quite different hypothesis functions given different training sets with the same underlying structure just due to different noise in two datasets.
Models with high variance can easily fit into training data and welcome complexity but are sensitive to noise.
Low Variance: Suggests small changes to the estimate of the target function with changes to the training dataset.
High Variance: Suggests large changes to the estimate of the target function with changes to the training dataset.
Examples of low-variance machine learning algorithms include: Linear Regression, Linear Discriminant Analysis and Logistic Regression.
Examples of high-variance machine learning algorithms include: Decision Trees, k-Nearest Neighbors and Support Vector Machines.
Variance is the amount that the estimate of the target function will change if different training data was used
Variance is the contribution to the total error due to sensitivity to noise in the data.
The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs ..A model has a low variance if it generalizes well on the test data
High variance can cause an algorithm to base estimates on the random noise found in a training data set, as opposed to the true relationship between variables
Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but has high error rates on test data.
When a model has high variance it describes very well the training data but at the moment of training with a different dataset it produces a very different model from the previous one and therefore a bad result at the moment of predicting. The variance is the amount by which the model will change with a different training set.
High variance means that the algorithms have become too specific.
The variance error can be easily eliminated by taking many samples from different models and averaging it out which is not possible in the case of bias.
Variance is high for a more complex model and is low for simpler models
Some variance is expected when training a model with different subsets of data. However, the hope is that the machine learning algorithm will be able to distinguish between noise and the true relationship between variables.
Small training data sets often lead to high variance models. A model with low variance will be relatively stable when the training data is altered (e.g., if you add or remove a point of training data).
High variance is often caused due to a lack of training data. The model complexity and quantity of training data need to be balanced. A model of higher complexity requires a larger quantity of training data. Hence, if the model is suffering from high variance, more datasets can reduce the variance.
If we want to reduce the amount of variance in a prediction, we must add bias.
Models with low variance tend to be less complex with a simple underlying structure. They also tend to be more robust (stable) to different training data (i.e., consistent, but inaccurate). Models that fall in this category generally include parametric algorithms, such as regression models.
Depending on the data, algorithms with low variance may not be complex or flexible enough to learn the true pattern of a data set, resulting in underfitting.
The opposite of a high bias model is a high variance one, which is fluid and better able to expand, morph and accommodate fluctuations in training data. While this approach is preferred over a biased one, developers should keep in mind that an easily changeable algorithm is also more noise-sensitive and might pose difficulties with data generalization
Variance is neither good nor bad for investors in and of itself. However, high variance in a stock is associated with higher risk, along with a higher return. Low variance is associated with lower risk and a lower return.
The higher the variance of the model, the more complex the model is and it is able to learn more complex functions. However, if the model is too complex for the given dataset, where a simpler solution is possible, a model with high Variance causes the model to overfit.
If the dataset consists of too many features for each data-point, the model often starts to suffer from high variance and starts to overfit. Hence, decreasing the number of features is recommended.
Machine learning algorithms that have a high variance are strongly influenced by the specifics of the training data. A high variance machine learning algorithm is extremely perceptive to the data which may lead to the overfitting of training data.
A model with high Variance will have the following characteristics:--
Overfitting: A model with high Variance will have a tendency to be overly complex. This causes the overfitting of the model.
Low Testing Accuracy: A model with high Variance will have very high training accuracy (or very low training loss), but it will have a low testing accuracy (or a low testing loss).
Overcomplicating simpler problems: A model with high variance tends to be overly complex and ends up fitting a much more complex curve to a relatively simpler data. The model is thus capable of solving complex problems but incapable of solving simple problems efficiently
Any model which has very large number of predictors will end up being a very complex model which will deliver very accurate predictions for the training data that it has seen already but this complexity makes the generalization of this model to unseen data very difficult i.e a high variance model. Thus, this model will perform very poorly on test data.
Variance helps us to understand the spread of the data.
When the changes in the training dataset suggest a small change in the estimate of the target function, there is a low variance. When the changes in the training dataset suggest a large change in the estimate of the target function, there is high variance.
Variance of an estimator is the “expected” value of the squared difference between the estimate of a model and the “expected” value of the estimate(over all the models in the estimator).
Variance can be described as the error caused by sensitivity to small variances in the training data set, or how much an estimate for a given data point will change if a different training data set is used.
Models with high variance can easily fit into training data and welcome complexity but are sensitive to noise.
Variance is the amount which indicates the variability of any model prediction..Variance is the variability of model prediction for a given data point or a value which tells us spread of data.
When the changes in the training dataset suggest a small change in the estimate of the target function, there is a low variance. When the changes in the training dataset suggest a large change in the estimate of the target function, there is high variance.
The variance is an error from sensitivity to small fluctuations in the training set
For high variance models an alternative is feature reduction, -- including more training data is also a viable option..
Machine learning algorithms that have a high variance are strongly influenced by the specifics of the training data Low variance is associated with lower risk and a lower return.
High variance stocks tend to be good for aggressive investors who are less risk-averse, while low variance stocks tend to be good for conservative investors who have less risk tolerance. Variance is a measurement of the degree of risk in an investment.
Variance of an estimator, does not depend on the parameter being estimated. It is a measure of how far values can the estimate take, away from its expected value
Variance in data is the variability of the model in a case where different Training Data is used. This would significantly change the estimation of the target function. Statistically, for a given random variable, Variance is the expectation of squared deviation from its mean.
Regularization is a process to decrease model complexity. Hence, if the model is suffering from high variance (which is caused by a complex model), then an increase in regularization can decrease the complexity and help to generalize the model better.
Machine learning algorithms that have a high variance are strongly influenced by the specifics of the training data. This means that the specifics of the training have influences the number and types of parameters used to characterize the mapping function.
High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs (overfitting).
On the one hand, we want algorithms to model the training data very closely, otherwise we’ll miss relevant features and interesting trends. However, on the other hand we don’t want our model to fit too closely, and risk over-interpreting every outlier and irregularity.
Generally, nonlinear machine learning algorithms that have a lot of flexibility have a high variance. For example, decision trees have a high variance, that is even higher if the trees are not pruned before use.
Again, variance occurs when the model performs well on the trained dataset but does not do well on a dataset that it is not trained on. Ideally, the result should not change too much from one set of data to another.
Underfitting occurs for a model with Low Variance and High Bias. The best way to understand the problem of underfitting and overfitting is to express it in terms of bias and variance.
A model has a high bias if it makes a lot of mistakes on the training data. We also say that the model underfits.
Underfitting means that the model does not fit well even to the data it is trained with.
Underfitting is often a result of an excessively simple model.. When a model is unable to capture the essence of the training data properly because of low number of parameters then this phenomenon is known as Underfitting.
When the complexity of the model is too less for it to learn the data that is given as input, the model is said to “Underfit”. As examples, we have linear regression algorithms, logistic regression or linear discriminant analysis.
Where the model is not complex enough it misses out on the important features of the data.. Since underfitting is a result of a low complexity of a model all we need to do is to increase the complexity.
In other words, the excessively simple model fails to “Learn” the intricate patterns and underlying trends of the given dataset. To avid underfit provide better predictor variables (feature engineering) or reduce the constraints applied to the model (regularization).
Underfitting happens when the statistical model cannot adequately capture the structure of the underlying data. The hypothesis function is too simple.
Underfitting destroys the accuracy of our machine learning model. Its occurrence simply means that our model or the algorithm does not fit the data well enough. It usually happens when we have less data to build an accurate model and also when we try to build a linear model with a non-linear data.
In such cases the rules of the machine learning model are too easy and flexible to be applied on such a minimal data and therefore the model will probably make a lot of wrong predictions. Underfitting can be avoided by using more data and also reducing the features by feature selection.
Underfitting will cause poor predictions because the fundamental relationship generated by the model does not match how the data behaves. No matter how many observations you gather for your data, the algorithm won’t be able to model the true shape of the data (e.g., a linear regression on an exponential data set).
A model with high bias is simpler than it should be and hence tends to underfit the data. In other words, the model fails to learn and acquire the intricate patterns of the dataset.
In case of underfitting, the bias is an error from a faulty assumption in the learning algorithm. This is such that when the bias is too large, the algorithm would be able to correctly model the relationship between the features and the target outputs.
When a model is underfit, it does not perform well on the training sets, and will not so on the testing sets, which means it fails to capture the underlying trend / pattern of the data.
Underfitting may occur if we are not using enough data to train the model, just like we will fail the exam if we did not review enough material; it may also happen if we are trying to fit a wrong model to the data, just like we will score low in any exercises or exams if we take the wrong approach and learn it the wrong way.
We call any of these situations high bias in machine learning, although its variance is low as performance in training and test sets are pretty consistent, in a bad way.
Bias is the algorithm’s tendency to consistently learn the wrong thing by not taking into account all the information in the data . When a model is unable to capture the essence of the training data properly because of low number of parameters then this phenomenon is known as Underfitting.
Underfitting happens when we have very less amount of data to build an accurate model or when we try to build a linear model with a nonlinear data. These kind of models are very simple to capture the complex patterns in data like Linear and logistic regression.
Linear regression has one problem, is that it tends to underfit the data. It gives us the lowest mean-squared error for unbiased estimators. Hence with underfitting, we aren't getting the best predictions. One way to reduce the mean-squared error is a technique known as LWLR.
It’s entirely possible to have state-of-the-art algorithms, the fastest computers, and the most recent GPUs, but if your model overfits or underfits to the training data, its predictive powers are going to be terrible no matter how much money or technology you throw at it.
If the model has a bias problem (underfitting), then both the testing and training error curves will plateau quickly and remain high. This implies that getting more data will not help! We can improve model performance by reducing regularization and/or by using an algorithm capable of learning more complex hypothesis functions.
Since underfitting means less model complexity, training longer can help in learning more complex patterns. This is especially true in terms of Deep Learning.
During training and validation, it is important to check the loss that is generated by the model. If the model is underfitting, the loss for both training and validation will be significantly high. In terms of Deep Learning, the loss will not decrease at the rate that it is supposed to if the model has reached saturation or is underfitting.
If a graph is plotted showing the data points and the fitted curve, and the curve is over-simplistic , then the model is suffering from underfitting. A more complex model can be tried out.
High bias can cause an algorithm to miss the relevant relations between features and target outputs (underfitting).
Parameter based learning algorithms usually have high bias and hence are faster to train and easier to understand. However, too much bias causes the model to be oversimplified and hence underfits the data. Hence these models are less flexible and often fail when they are applied on complex problems.
A model with high Bias means the model is Underfitting the given data-- and a model with High Variance means the model is Overfitting the given data.
Overfitting means that the model has memorized the training data and can’t generalize to things it hasn’t seen.
A low bias and high variance problem is overfitting.
When a mode is built using so many predictors that it captures noise along with the underlying pattern then it tries to fit the model too closely to the training data leaving very less scope for generalizability. This phenomenon is known as Overfitting.
Overfitting will cause poor predictions because the model is overmatching the training data (in some extreme cases, memorizing the training data), and not making any inductive leaps about the true relationship.
Overfitting is associated with high variance, and therefore the models produced in an overfitting scenario will differ wildly depending on what training data is used. Overfit models handle their training data perfectly, but fail to generalize to new data sets.
Overfitting refers to a model that models the training data too well. Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data.
The variance is an error from sensitivity to small fluctuations in the training set. High variance can cause an algorithm to model the random noise in the training data, rather than the intended outputs..
The problem where the model chosen is too complex, and becomes specific to the training data set is called overfitting.
Overfitting means that the neural network performs very well on training data, but fails as soon as it sees some new data from the problem domain. Underfitting, on the other hand, means, that the model performs poorly on both datasets.
Both overfitting and underfitting are not desirable phenomenons. However, by far the most common problem in deep learning and machine learning is overfitting.
Overfitting is a much bigger problem because the evaluation of deep learning/machine learning models on training data is quite different from the evaluation that is actually most important for us, which is the evaluation of the model on unseen data (validation set).
When a model gets trained with so much of data, it starts learning from the noise and inaccurate data entries in our data set. Then the model does not categorize the data correctly, because of too much of details and noise.
The causes of overfitting are the non-parametric and non-linear methods because these types of machine learning algorithms have more freedom in building the model based on the dataset and therefore they can really build unrealistic models.
A solution to avoid overfitting is using a linear algorithm if we have linear data or using the parameters like the maximal depth if we are using decision trees.
The commonly used methodologies used to prevent overfitting are :--
Cross- Validation: A standard way to find out-of-sample prediction error is to use 5-fold cross validation.
Early Stopping: Its rules provide us the guidance as to how many iterations can be run before learner begins to over-fit.
Pruning: Pruning is extensively used while building related models. It simply removes the nodes which add little predictive power for the problem in hand.
Regularization: It introduces a cost term for bringing in more features with the objective function. Hence it tries to push the coefficients for many variables to zero and hence reduce cost term.
Overfitting is the case where the generalization of the model is unreliable. This is due to the model learning “too much” from the training data set.
The phenomenon of memorization can cause overfitting. We are over extracting too much information from the training sets and making our model just work well with them, which is called low bias in machine learning.
However, at the same time, it will not help us generalize with data and derive patterns from them. The model as a result will perform poorly on datasets that were not seen before. We call this situation high variance in machine learning.
.
Overfitting occurs when we try to describe the learning rules based on a relatively small number of observations, instead of the underlying relationship.. Overfitting also takes place when we make the model excessively complex so that it fits every training sample, such as memorizing the answers for all questions
Decision tress are prone to Overfitting especially when a tree is particularly deep. This is due to the amount of specificity we look at leading to smaller sample of events that meet the previous assumptions.
A decision tree is a lot like a flowchart. To utilize a flowchart you start at the starting point, or root, of the chart and then based on how you answer the filtering criteria of that starting node you move to one of the next possible nodes. This process is repeated until an ending is reached.
Pruning can help increase the performance of a decision tree by stripping out branches containing features that have little predictive power/little importance for the model
Pruning is the process of removing the unnecessary structure from a decision tree, effectively reducing the complexity to combat overfitting with the added bonus of making it even easier to interpret.
Decision trees are the most susceptible out of all the machine learning algorithms to overfitting and effective pruning can reduce this likelihood.
Models with low bias algorithms tend to be more complex, with a more flexible underlying structure. The higher level of flexibility in the models can allow for more complex relationships between data but can also cause overfitting because the model is free to memorize the training data, instead of generalizing a pattern found in the data.
Models with low bias also tend to be less stable between training data sets. Non-parametric models (e.g., decision trees) typically have low bias and high variability.
Any high complexity model (Decision trees) will be prone to overfitting due to low bias and high variance . The best we can do is try to settle somewhere in the middle of the spectrum. An ideal model will exist in the happy place between overfitting and underfitting, where the “true” relationship between the data is captured, but the random noise of the data set is not.
Pruning is the process of removing the unnecessary structure from a decision tree, effectively reducing the complexity to combat overfitting with the added bonus of making it even easier to interpret.
The parameters to look out for to determine if the model is overfitting or not is similar to those of underfitting ones. These are listed below:--
Training and Validation Loss: it is important to measure the loss of the model during training and validation. A very low training loss but a high validation loss would signify that the model is overfitting. Additionally, in Deep Learning, if the training loss keeps on decreasing but the validation loss remains stagnant or starts to increase, it also signifies that the model is overfitting.
Too Complex Prediction Graph: If a graph is plotted showing the data points and the fitted curve, and the curve is too complex to be the simplest solution which fits the data points appropriately, then the model is overfitting.
Classification: If every single class is properly classified on the training set by forming a very complex decision boundary, then there is a good chance that the model is overfitting.
Regression: If the final “Best Fit” line crosses over every single data point by forming an unnecessarily complex curve, then the model is likely overfitting.
Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks that operates by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees.
Random forest are also known as Ensemble Technique.
These techniques are divide and conquer approach. It uses small number of weak learner to generate a strong learner. In random Forest, small learners are small trees. Together with the power of majority voting they make strong learner.
Pre-pruning stop growing the tree earlier, before it perfectly classifies the training set. Post-pruning allows the tree to perfectly classify the training set
Solutions for overfitting include simplifying your model (e.g., selecting a model with fewer parameters or reducing the number of attributes), gathering more training data, or reducing noise in the training data (e.g., finding and correcting errors, removing outliers)..
Overfitting can be showcased in two forms of supervised learning: Classification and Regression.
Overfitting happens when a model learns the detail and noise in the training data to the extent that it negatively impacts the performance of the model on new data
Overfitting a model is a condition where a statistical model begins to describe the random error in the data rather than the relationships between variables. This problem occurs when the model is too complex. ... Thus, overfitting a regression model reduces its generalizability outside the original dataset.
Again, Bias denotes the simplicity of the model. A high biased model will have a simpler architecture than that of a model with a lower bias. Similarly, complementing Bias, Variance denotes how complex the model is and how well it can fit the data with a high degree of diversity. An ideal model should have Low Bias and Low Variance.
However, when it comes to practical datasets and models, it is nearly impossible to achieve a “zero” Bias and Variance. These two are complementary of each other, if one decreases beyond a certain limit, then the other starts increasing. This is known as the Bias-Variance Tradeoff. Under such circumstances, there is a “sweet spot” as shown in the figure, where both bias and variance are at their optimal values.
If the model is overfitting, the developer can take the following steps to recover from the overfitting state:--
Early Stopping during Training: This is especially prevalent in Deep Learning. Allowing the model to train for a high number of epochs (iterations) may lead to overfitting. Hence it is necessary to stop the model from training when the model has started to overfit. This is done by monitoring the validation loss and stopping the model when the loss stops decreasing over a given number of epochs (or iterations).
Train with more data: Often, the data available for training is less when compared to the model complexity. Hence, in order to get the model to fit appropriately, it is often advisable to increase the training dataset size.
Train a less complex model: As mentioned earlier, the main reason behind overfitting is excessive model complexity for a relatively less complex dataset. Hence it is advisable to reduce the model complexity in order to avoid overfitting. For Deep Learning, the model complexity can be reduced by reducing the number of layers and neurons.
Regularization: Regularization is the process of simplification of the model artificially, without losing the flexibility that it gains from having a higher complexity. With the increase in regularization, the effective model complexity decreases and hence prevents overfitting.
Handling overfitting-- Reduce the network's capacity by removing layers or reducing the number of elements in the hidden layers.
Apply regularization, which comes down to adding a cost to the loss function for large weights.
Use Dropout layers, which will randomly remove certain features by setting them to zero.
In case of overfitting, variance is an error resulting from fluctuations in the training dataset. A high value for variance would cause the algorithm may capture the most data points put would not be generalized enough to capture new data points.
Overfitting means that the neural network models the training data too well. Overfitting suggests that the neural network has a good performance. But it fact the model fails when it faces new and yet unseen data from the problem domain.
Again, Overfitting is happening for two main reasons:--
The data samples in the training data have noise and fluctuations.
The model has very high complexity
When the statistical model contains more parameters than justified by the data. This means that it will tend to fit noise in the data and so may not generalize well to new examples. The hypothesis function is too complex.
If the model has a variance problem (overfitting), the training error curve will remain well below the testing error and may not plateau. If the training curve does not plateau, this suggests that collecting more data will improve model performance. To prevent overfitting and bring the curves closer to one another, one should increase the severity of regularization, reduce the number of features and/or use an algorithm that can only fit simpler hypothesis functions
With the passage of time, the model will keep on learning and thus the error for the model on the training and testing data will keep on decreasing. If it will learn for too long, the model will become more prone to overfitting due to presence of noise and less useful details. Hence the performance of our model will decrease. In order to get a good fit, we will stop at a point just before where the error starts increasing. At this point the model is said to have good skills on training dataset as well our unseen testing dataset.
Again, it’s entirely possible to have state-of-the-art algorithms, the fastest computers, and the most recent GPUs, but if your model overfits or underfits to the training data, its predictive powers are going to be terrible no matter how much money or technology you throw at it.
In statistics and machine learning, bias–variance tradeoff is the property of a set of predictive models whereby models with a lower bias in parameter estimation have a higher variance of the parameter estimates across samples, and vice versa.
The bias–variance dilemma or bias–variance problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set
In reality, we want something in the middle with some amount of bias and variance called Bias-Variance tradeoff. The optimal machine learning model should have some authority to generalize but at the same time, it should be very open to listening to data.
This tradeoff applies to all forms of supervised learning: classification, regression (function fitting), and structured output learning. It has also been invoked to explain the effectiveness of heuristics in human learning.
High Bias High Variance – Models are inaccurate and also inconsistent on average.
Low Bias Low Variance: This is the unicorn.
It is obvious that it is nearly impossible to have a model with no bias or no variance since decreasing one increases the other. This phenomenon is known as the Bias-Variance Trade off
If hypothetically, infinite data is available, it is possible to tune the model to reduce the bias and variance terms to zero but is not possible to do so practically. Hence, there is always a tradeoff between the minimization of bias and variance.
High bias means that the algorithm have failed to understand the pattern in the input data.
It’s generally not possible to minimize both errors simultaneously, since high bias would always means low variance, whereas low bias would always mean high variance.
Finding a trade-off between the two extremes is known as Bias/Variance Tradeoff.
We have to avoid overfitting because it gives too much predictive power to even noise elements in our training data. But in our attempt to reduce overfitting we can also begin to underfit, ignoring important features in our training data.
Supervised machine learning algorithms can best be understood through the lens of the bias-variance trade-off.
The objective of any machine learning algorithm is to achieve low bias and low variance, achieving at the same time a good performance predicting results.
The Bias-Variance Tradeoff is relevant for supervised machine learning - specifically for predictive modeling. It's a way to diagnose the performance of an algorithm by breaking down its prediction error.
The bias-variance dilemma or bias-variance problem is the conflict in trying to simultaneously minimize these two sources of error that prevent supervised learning algorithms from generalizing beyond their training set.
The bias opposite to the variance refers to the precision opposite to consistency of the trained models.
In the machine learning world, precision is everything. When we try to develop a model, we try to make it as much accurate as possible playing with the different parameters. But, the hard truth is that we cannot build a one-hundred per cent accurate model due to we cannot build a free of errors model.
What we can do, it is trying to understand the possible sources of errors and this will help us to obtain a more precise model.
High Bias Low Variance: Models are consistent but inaccurate on average. Tend to be less complex with a simple or rigid structure like linear regression or Bayesian linear regression.
Low Bias High variance: Models are somewhat accurate but inconsistent on averages. Tend to be more complex with a flexible structure like decision trees or k-nearest neighbour (KNN).
We want to avoid both overfitting and underfitting. Bias is the error stemming from incorrect assumptions in the learning algorithm; high bias results in underfitting, and variance measures how sensitive the model prediction is to variations in the datasets. Hence, we need to avoid cases where any of bias or variance is getting high.
So, does it mean we should always make both bias and variance as low as possible? The answer is yes, if we can. But in practice, there is an explicit trade-off between themselves, where decreasing one increases the other. This is the so-called bias–variance tradeoff.
The name bias-variance dilemma comes from two terms in statistics: bias, which corresponds to underfitting, and variance, which corresponds to overfitting
If the model is too simple and has very few parameters then it may have high bias and low variance. On the other hand if our model has large number of parameters then it’s going to have high variance and low bias. So we need to find the right/good balance without overfitting and underfitting the data.
From the understanding of bias and variance individually , it can be concluded that the two are complementary to each other. In other words, if the bias of a model is decreased, the variance of the model automatically increases. The vice-versa is also true, that is if the variance of a model decreases, bias starts to increase.
Bias and variance are actually side effects of one factor: the complexity of the model.
This tradeoff is somewhat like the electron and the proton balance in an atom, both are equally important and their harmony is important for the universe as a whole.
Similarly, bias and variance are two kinds of errors to be minimised during the model building. Any low complexity model- will be prone to underfitting because of high bias and low variance
The bias-variance tradeoff refers to the fact that models with high bias will have low variance and vice versa, so it it important to choose a model that has an optimal contribution from both such that total error is minimized
Bias-variance tradeoff is a serious problem in machine learning. It is a situation when you can’t have both low bias and low variance. But you have to have a tradeoff by training a model which captures the regularities in the data enough to be reasonably accurate and generalizable to a different set of points from the same source, by having optimum bias and optimium variance.
Bias-variance tradeoff forms an essential entity of machine and statistical learning. All the learning algorithms involve a significant measure of error.
These errors can be reducible and non-reducible. We cannot do anything about non-reducible errors. The reducible errors which are the bias and variance can be made use off.
These reducible errors can be effectively minimized and the efficiency of a working system can be maximized. The main goal of a learning algorithm is to reduce these bias and variance errors to the minimum and make the most feasible model.
Achieving such a goal is not very easy that is when a tradeoff is made to reduce the possible sources of errors when a certain model is being selected based on their varied flexibility and complexity.
We have to avoid overfitting because it gives too much predictive power to even noise elements in our training data. But in our attempt to reduce overfitting we can also begin to underfit, ignoring important features in our training data. We need a balance.
The bias–variance decomposition is a way of analyzing a learning algorithm's expected generalization error with respect to a particular problem as a sum of three terms, the bias, variance, and a quantity called the irreducible error, resulting from noise in the problem itself.
This tradeoff applies to all forms of supervised learning: classification, regression (function fitting), and structured output learning. It has also been invoked to explain the effectiveness of heuristics in human learning.
This bias-variance tradeoff equation works well with both predictive as well as explanatory models. When we take the explanatory model into consideration, the main goal would be to reduce the bias to the maximum extent to get the underlying theory’s most accurate representation.
On the one hand, we want our algorithm to model the training data very closely, otherwise we’ll miss relevant features and interesting trends.
However, on the other hand we don’t want our model to fit too closely, and risk over-interpreting every outlier and irregularity.
Bias-Variance Dichotomy-- The concept here is that while adding complexity to the machine learning model might improve the fit to the training data, it need not improve the prediction accuracy on the training data (i.e new data).
The optimal region of the Bias-Variance tradeoff, the model is neither underfitting nor overfitting. Hence, since there is neither underfitting nor overfitting, it can also be said that the model is most Generalized, as under these conditions the model is expected to perform equally well on Training and Validation Data.
Bias is the simplifying assumptions made by the model to make the target function easier to approximate. Variance is the amount that the estimate of the target function will change given different training data. Trade-off is tension between the error introduced by the bias and the variance
Bias Variance tradeoff as the name suggests is a tradeoff between bias and variance. An algorithm can’t be more complex and less complex at the same time. . To build a good model, we need to find a good balance between bias and variance such that it minimizes the total error.
Big difference in errors between the Training and Testing Set clearly signifies that something is wrong with the Polynomial model. This drastic change in error is due to a phenomenon called Bias-Variance Tradeoff
Models with higher bias tend to be relatively simple (low-order or even linear regression polynomials) but may produce lower variance predictions when applied beyond the training set
Machine Learning is a scientific field of study which involves the use of algorithms and statistics to perform a given task by relying on inference from data instead of explicit instructions.
In Machine Learning, only a part of the data is available to the user at the time of training (fitting) the model, and then the model has to perform equally well on data that it has never encountered before. Which is, in other words, the generalization of the model over a given data, such that it is able to correctly predict when it is deployed.
The bias–variance decomposition is a way of analyzing a learning algorithm's expected generalization error with respect to a particular problem as a sum of three terms, the bias, variance, and a quantity called the irreducible error, resulting from noise in the problem itself.
Again, models with high bias tend to under-fit, while models with high variance tend to overfit.
A good model should do one of two things:--
Capture the patterns in the given training data set
Correctly compute the output for a new instance
The model should be complete enough to represent the data, but the more complex the model, the better it represents the training data. However, there is a limit to how complex the model can get.
If the model is too complex, then it will pick up specific random features (noise or example) in the training data set.
If the model is not complex enough, then it might miss out on important dynamics of the data given.
It is generally impossible to minimize the two errors at the same time and this trade-off is what is known as bias/variance trade-off.
Spear phishing accounts for 92% of cyberattacks.
Spear phishing is the fraudulent practice of sending emails ostensibly from a known or trusted sender in order to induce targeted individuals to reveal confidential information.
Phishing is a broader term for any attempt to trick victims into sharing sensitive information such as passwords, usernames, and credit card details for malicious reasons. ...
Unlike spear-phishing attacks, phishing attacks are not personalized to their victims, and are usually sent to masses of people at the same time.
Spear phishing attempts have been used to swindle individuals and companies out of millions of dollars. They can also do damage in other areas, such as stealing secret information from businesses or causing emotional stress to individuals..
Last year, in 2018, airbnb customers were targeted by a spear phishing attack where cyber attackers used social engineering methods to pick victims. they then sent out a fake email stating the implications of the general data protection regulation (gdpr). the fake email prompted its recipients not to accept further bookings until they comply with gdpr (sent by attacks via airbnb). The attached link took the customers to a spoof site, which then collected the personal details of the victims.
Spear phishing attackers plan their attacks by first identifying their target. there are a few elements common to spear phishing attacks –
Source looks like a legit one – scammers send emails to these targets in such a way that it seems to be from a legitimate source. these sources closely resemble a genuine email id.
For instance, mnop@gmail.com becomes rnop@gmail.com, where ‘m’ is replaced by ‘rn,’ making it look like an ‘m.’ ( most long sighted people don’t wear reading glasses )
Spear phishing can lead to the compromise of sensitive data. if the required security measures are not put in place properly, the targeted attack may lead to a destructive security breach..
Filter your inbox – Configure your email application so that it blocks spam-emails efficiently. It must separate emails generated from trusted sources from those outside the system.
Encryption – If an email is sent after cryptographically signing it, then only the person with the private key can access the content of the mail. It makes it difficult for an imposter to pass off as a legit source.
Anti-spam software and devices – It has been noticed that spear phishing messages target systems that are already compromised. In this scenario, anti-spam software and devices can identify a compromised mail server.
Update all software – Updating installed applications and software is a crucial step. If not done regularly, cybercriminals can misuse this lag, to exploit the known-unknown vulnerabilities.
Keep an eye on your online activities – Have you been sharing your personal information on social media accounts? If yes, then a potential scammer can use the same details to frame a personalized message, which might lead you to a spear phishing attack.
Use smart passwords – If you are someone who uses the same password or variations of it on different platforms, then change your password to a random phrase or combination of numbers, letters, and special symbols.
Implement a data protection program – Every mid to large size corporation should have a data protection program. Install a data loss prevention software so that it can efficiently protect sensitive data.
Awareness – Make sure that your employees are well-aware of spear phishing attacks. They should know how to detect such attacks. Train them on good email practices –
Not to reveal personal information on emails unless sent from a trusted source (after cross-verifying the source from a legit database)
Never click on links sent through emails, especially the ones asking for your financial or banking details
Always report suspicious emails
As emails are the most common entry points of targeted attacks, it is vital to protect organizations from anticipated attacks
Spear-phishing is defined as cyberattacks targeting specific users within an organization to distribute malware or extract sensitive data. Hackers typically use two different methods to gain access: emails sent with file attachments and files without attachments (emails containing malicious links).
Employees should be instructed to forward emails to a dedicated person within the organization to verify its authenticity. Organizations also need clear processes to help employees both identify attacks and report them to a designated point person.
The better users become at detecting spear phishing, the less likely the organization is to be compromised by an attacker.
Machine learning – the heart of what we call artificial intelligence today – gets “smart” by observing patterns in data, and making assumptions about what it means, whether on an individual computer or a large neural network.
So, if a specific action in computer processors takes place when specific processes are running, and the action is repeated on the neural network and/or the specific computer, the system learns that the action means that a cyber-attack has occurred, and that appropriate action needs to be taken.
But here is where it gets tricky. AI-savvy malware could inject false data that the security system would read – the objective being to disrupt the patterns the machine learning algorithms use to make their decisions. Thus, phony data could be inserted into a database to make it seem as if a process that is copying personal information is just part of the regular routine of the IT system, and can safely be ignored.
Instead of trying to outfox intelligent machine-learning security systems, hackers simply “make friends” with them – using their own capabilities against them, and helping themselves to whatever they want on a server.
There are all sorts of other ways hackers could fool AI-based security systems. It’s already been shown for example, that an AI-based image recognition system could be fooled by changing just a few pixels in an image. In one famous experiment at Kyushu University in Japan, scientists were able to fool AI-based image recognition systems nearly three quarters of the time, “convincing” them that they were looking not at a cat, but a dog or even a stealth fighter.
Another tactic involves “bobbing and weaving,” where hackers insert signals and processes that have no effect on the IT system at all – except to train the AI system to see these as normal. Once it does, hackers can use those routines to carry out an attack that the security system will miss – because it’s been trained to “believe” that the behavior is irrelevant, or even normal.
Yet another way hackers could compromise an AI-based cybersecurity system is by changing or replacing log files – or even just changing their timestamps or other metadata, to further confuse the machine-learning algorithms.
Thus, the great strength of AI has the potential to be its downfall.
Companies that install advanced AI security systems tend to become complacent about cybersecurity, believing that the system will protect them, and that by installing it they’ve assured their safety.
Keeping a human eye on the AI that is ostensibly protecting organizations is the first step in ensuring that they are getting their money’s worth out of their cybersecurity systems.
Hardening the AI: One tactic hackers use to attack is inundating an AI system with low-quality data in order to confuse it. To protect against this, security systems need to account for the possibility of encountering low-quality data.
Stricter controls on how data is evaluated – for example, examining the timestamps on log files more closely to determine if they have been tampered with – could take from hackers a weapon that they are currently successfully using.
More attention to basic security: Hackers most often infiltrate organizations using their tried and true tactics – APT or run of the mill malware. By shoring up their defenses against basic tactics, organizations will be able to prevent attacks of any kind – including those using advanced AI – by keeping malware and exploits off their networks altogether.
Educating employees on the dangers of responding to phishing pitches – including rewarding those who avoid them and/or penalizing those who don’t – along with stronger basic defenses like sandboxes and anti-malware systems, and more intelligent AI defense systems can go a long way to protect organizations. AI has the potential to keep our digital future safer; with a little help from us, it will be able to avoid manipulation by hackers, and do its job properly
Hackers are using AI to build customized programs capable of getting past a company’s defenses. State-of-the-art defenses generally rely on examining what the attack software is doing, rather than the more commonplace technique of analyzing software code for danger signs. But the new generation of AI-driven programs can be trained to stay dormant until they reach a very specific target, making them exceptionally hard to stop
Data is at the heart of any AI implementation and IT security is no different. To be effective, AI algorithms must be driven by the right data systems. The data must not just exist but should be current.
After all, AI seeks to mimic human intelligence and should, therefore (ideally), be designed to continuously improve itself based on new knowledge. So, identifying the required data sets must be the first thing the business does in their quest to operationalize the new AI-driven cybersecurity algorithms.
Security orchestration, automation, and response (SOAR) are tools that help organizations collect security information from multiple sources. SOAR enables incident triage and analysis by combining human and machine capabilities. This allows the defining, prioritizing, and driving of incident response through a standard workflow connecting data sources and data platforms.
SOAR is an essential component in optimizing the output of AI-based cybersecurity tools. It improves alert quality, reduces the time needed for onboarding cyber analysts, and improves security management.
SOAR is a solution stack of compatible software programs that allow an organization to collect data about security threats from multiple sources and respond to low-level security events without human assistance.
Autonomic Intelligent Cyber Sensor (AICS), developed with funding from the Department of Energy, employs artificial intelligence to detect intruders, isolate them and retaliate against them..
AICS uses a proprietary cluster algorithm to learn and map the business and operational systems so it can recognize anomalies.
It constantly monitors not only network traffic across industrial control systems, but its sensors keep tabs on voltages and amperages in connected systems to look for irregularities indicating an intruder is present
AICS uses machine learning to add to its knowledge base of threats, making it better able to identify threats as time goes on
AICS also employs honeypots, monitored networks that appear to be part of the production system but that isolate and quarantine intruders. AI is used to update these virtual decoys in ways that mimic a live network so prevent intruders do not realize they are being observed. Once in the honeypot, and intruder can be tracked, analyzed, diverted from targeted systems and potentially hacked back
Honeypots, take the bait and trap approach. A honeypot is an isolated computer or network site that is set up to attract hackers. Cyber security analysts use honeypots to research evolving tactics, prevent attacks and catch intruders
DIGRESSION: A HONEYPOT JEWESS NAMED HELENA CREATED THE DEEP STATE ..
AMONG CRIMINAL MINDS SHE IS NO 1 , SINCE HISTORY OF MAN BEGAN 65 MILLION YEARS AGO..
Deception technologies play an important defensive strategy. By setting an irresistible honeypot, attackers are fooled into thinking they have gained access to the real system or target. Once tricked, the methods and tactics of the attackers can be safely monitored to gain critical intelligence and identify where defence systems needed to be ramped up.
Late in 1988, a man named Robert Morris had an idea: he wanted to gauge the size of the internet. To do this, he wrote a program designed to propagate across networks, infiltrate Unix terminals using a known bug, and then copy itself. This last instruction proved to be a mistake. The Morris worm replicated so aggressively that the early internet slowed to a crawl, causing untold damage.
The worm had effects that lasted beyond an internet slowdown. For one thing, Robert Morris became the first person successfully charged under the Computer Fraud and Abuse Act (although this ended happily for him—he’s currently a tenured professor at MIT). More importantly, this act also led to the formation of the Computer Emergency Response Team (the precursor to US-CERT), which functions as a nonprofit research center for systemic issues that might affect the internet as a whole.
The Morris worm appears to have been the start of something. After the Morris worm, viruses started getting deadlier and deadlier, affecting more and more systems. It seems as though the worm presaged the era of massive internet outages in which we live. You also began to see the rise of antivirus as a commodity—1987 saw the release of the first dedicated antivirus company
https://en.wikipedia.org/wiki/Robert_Tappan_Morris
.
Network breaches and malware did exist and were used for malicious ends during the early history of computers, however. The Russians, for example, quickly began to deploy cyberpower as a weapon. In 1986, the German computer hacker Marcus Hess hacked an internet gateway in Berkeley, and used that connection to piggyback on the Arpanet. He hacked 400 military computers, including mainframes at the Pentagon, with the intent of selling their secrets to the KGB. He was only caught when an astronomer named Clifford Stoll detected the intrusion and deployed a honeypot technique.
EternalBlue was leaked by the Shadow Brokers hacker group on April 14, 2017, and was used as part of the worldwide WannaCry ransomware attack on May 12, 2017. The exploit was also used to help carry out the 2017 NotPetya cyberattack on June 27, 2017 and reportedly is used as part of the Retefe banking trojan since at least September 5, 2017. No Anti-Virus or even next generation EPP can effectively prevent exploitation using EternalBlue.
To a large extent, cybersecurity relies on file signatures to detect malware, and rules-based systems for detecting network abnormalities. Protection often stems from an actual virus outbreak – as security experts isolate the malicious files and identify unique signatures that help other systems become alert and immune.
The same is true for the rules-based system: Rules are set based on experience of potential malicious activity, or systems are locked down to restrict any access to stay on the safe side. The only problem with these approaches is their reactive nature. Hackers always find innovative ways to bypass the known rules. Before a security expert discovers the breach, it’s often too late.
Traditional malware is designed to perform its damaging functions on every device they find their way into. One example is the NotPetya ransomware outbreaks, in which hundreds of thousands of computers were infected in a short period of time. This method works when the attacker’s goal is to inflict maximum damage. It’s not as effective if an attacker has a specific target in mind.
But the advent of disruptive technologies like Artificial Intelligence means our devices and applications are understanding us better. For example, an iPhone X uses AI to automatically recognize faces. While it’s a great feature, it creates an intricate puzzle where the chances of sensitive data going in wrong hands are high. Today, hackers are seen using the same technology to develop smart malware that can prey on targets by pinpointing them from millions of users.
What makes AI cybersecurity unique is its adaptability. Intelligent cybersecurity doesn’t need to follow specific rules. Rather, it can watch patterns and learn. Even better, AI can be directly integrated into everyday protection tools – such as spam filters, network intrusion and fraud detection, multi-factor authentication, and incident response.
Malicious actors on the web closely monitor cyber security trends and react to them by reshaping viruses, exploits and other attack methods to subvert safety nets. Thus, there always arise instances in which attackers seize the advantage and their opponents appear to have brought a knife to a gunfight, figuratively speaking.
1 in every 20 malware attacks around the globe involves the use of ransomware. Moreover, most of its victims pay about $300 to escape the system interference involved in each hack, though some criminals will charge thousands more.
Ransomware is a type of malicious software, or malware, designed to deny access to a computer system or data until a ransom is paid. Ransomware typically spreads through phishing emails or by unknowingly visiting an infected website. Ransomware can be devastating to an individual or an organization
WannaCry is a ransomware worm that spread rapidly through across a number of computer networks in May of 2017. After infecting a Windows computers, it encrypts files on the PC's hard drive, making them impossible for users to access, then demands a ransom payment in bitcoin in order to decrypt them.
WannaCry spread like wildfire, encrypting hundreds of thousands of computers in more than 150 countries in a matter of hours. WannaCry caused panic. Systems were down, data was lost and money had to be spent. It was a wake-up call that society needed to do better at basic cybersecurity.
Nearly all ransomware perpetrators demand payment in Bitcoin, which uses blockchain encryption to prevent intrusion and tracking, as Europol identified in its 2016 Internet Organized Crime Threat Assessment report. The more items in a person’s household that are remotely controlled using IoT methods – security, vehicles, range stoves, and thermostats, to name a few – the greater the risk.
MAJOR CITIES CAN BE GUTTED BY FIRES
Businesses have even more to lose from IoT-centric malware attacks than individuals, especially if they use these solutions for functions like building security or any number of back-end processes. But the opposite approach – using legacy systems to manage business operations or utilities, for example – can be just as flawed and unsafe.
The Mirai botnet-assisted malware that battered Amazon, PayPal, Reddit and Dyn – the latter a firm providing server backup for massive swaths of the world’s internet – with service outages possessed sophisticated coding that allowed for easy updates as hackers passed it among themselves. This function effectively circumvented many malware countermeasures.
DDoS attackers once focused largely on governments and financial institutions, but Total Retail reported that in light of the Mirai hack, businesses throughout the entire private sector should consider the potential for this threat. Upon putting DDoS in place, black-hat hackers can plunder an organization’s databases with aplomb and sell corporate secrets or employee information on the black market, or simply hold the network hostage like a ransomware attack.
DDoS is short for Distributed Denial of Service. DDoS is a type of DOS attack where multiple compromised systems, which are often infected with a Trojan, are used to target a single system causing a Denial of Service (DoS) attack
A distributed denial-of-service (DDoS) attack occurs when multiple systems flood the bandwidth or resources of a targeted system, usually one or more web servers. Such an attack is often the result of multiple compromised systems (for example, a botnet) flooding the targeted system with traffic
THIS POST IS NOW CONTINUED TO PART 4 BELOW--
CAPT AJIT VADAKAYIL
..