www.b-b.by/newlogic A cup of diagrams of neural network architectures. www.b-b.by/newlogic

3. GENERAL INFORMATION
    ABOUT ARTIFICIAL NN
    IN AI SYSTEMS.


      3.1. Architecture of artificial neural networks in AI systems.

Basically a complete diagram of neural network architectures.
      Figure 6. Basically a complete diagram of neural network architectures ( https://www.asimovinstitute.org/neural-network-zoo/ ).

      3.2. A heuristic strategy is to solve a problem based on the multiple movement of information (signal) through the multilayer structure of an artificial or biological neural network with a given (or flexible) architecture of feedback between network elements, for the purpose of technical training.
      Learning consists of finding the coefficients of statistical "weights" of neurons. During the learning process, the neural network is able to identify complex dependencies between input and output data, as well as perform generalization.
      This method is also called "Experimental" or "Trial and Error" (with correction of the coefficients of statistical "weights" of neurons according to the calculated target function (error minimization - gradual elimination of CONTRADICTION between the values of 2 parameters), in each iteration cycle).






      3.3. The first stage of the neural network functioning is training, the INDUCTION METHOD is used.

      INDUCTIVE method (targeting) - from the particular to the general or generalization, it reveals the relationship between different data (parameters).
      Neural network training consists in finding optimal values for all coefficients of statistical "weights" of neurons - w (methods can be: training "with a teacher", training "without a teacher", training with reinforcement (reinforcement learning)).

      Training can also be performed by three methods: stochastic method (stochastic), batch method (batch) and mini-batch method (mini-batch).

      In fact, a finished intellectual product is created:
a generalized complex NN modified activation function (-`template`) is created using the INDUCTIVE empirical method in machine learning, and this NN modified activation function (-`template`) is ‘fitted’ to the given training data.

      From the point of view of mathematics, neural network training is a multiparametric problem of nonlinear optimization.

      A neural network in the process of tuning (training) to solve a specific problem is considered as a multidimensional nonlinear system that, in an iterative mode, purposefully searches for the optimum of some functional that quantifies the quality of the solution of the task.


      There are different methods of teaching the NN, the most interesting ones:

      ● The Backpropagation method.

      ● Resilient propagation or Rprop.

      ● Genetic Algorithm.

      The main and most common neural network architecture, which revolutionized the teaching of the network "without a teacher" (clustering, self-learning), and also allowed the transition from academic interest to commercial use, is the back propagation network (the method uses a gradient descent algorithm) - a powerful tool for finding patterns, forecasting, qualitative analysis...

      Gradient descent is a way of finding the local minimum or maximum of a function by moving along a gradient.

      They got this name – back propagation networks because of the training algorithm used, in which the error propagates from the output layer to the input, thus in the direction opposite to the direction of signal propagation during normal network operation.

      The neural network of back propagation consists of several layers of neurons, and each neuron of layer i is connected to each neuron of layer i+1, thus we are talking about a fully connected neural network.

      In general, the task of training a neural network boils down to finding some kind of functional dependence Y=F(X), where X is the input and Y is the output vectors. Such a problem, with a limited set of input data, has an infinite set of solutions.

      To limit the search space in the neural network training mode, the task of minimizing the objective function of the neural network error is set, which is found by the least squares method.


      Note:
      Activation function is a nonlinear transformation applied element by element to the input data.
      Thanks to activation functions, neural networks are able to generate more informative feature descriptions by transforming data in a nonlinear way.
      The activation function determines the output signal of a neuron based on its input.
      We simply substitute into it the summed value of the product of the input neural signals and the weight coefficients - and we get the output signal of the neural network.
      The activation function is used to introduce nonlinearity into a neural network.
      It determines the output value of a neuron, which will depend on the sum of the inputs and the threshold.
      It also determines which neurons should be activated, and therefore what information will be passed on to the next layer.
      The activation function allows deep networks to learn.
      To simplify, the activation function acts as a `blank` that must be processed in accordance with the input data.
      The following can be used as activation functions: threshold function, hyperbolic tangent function, logistic function, etc.



      Activation layers are one of the main types of layers used in neural networks.
      They are a function that adds nonlinearity to the output of the previous layer. This allows the neural network to better model complex functions and predict results more accurately.
      Activation layers take the results of the previous layer, called the input, and transform them into an output value that is passed to the next layer.
      To do this, they use an activation function, which determines how the data will be transformed.

      Neuron is the basic unit of a neural network.
      Each neuron has a certain number of inputs, where signals are received, which are summed up taking into account the significance of the statistical coefficient (`weight`) of each input.
      Then the signals are sent to the inputs of other neurons.
      The weight of each such "node" can be either positive or negative.
      For example, if a neuron has four inputs, then it also has four `weight` values, which can be adjusted independently of each other.

      Connections connect neurons with each other.
      The weight value is directly related to the connection, and the goal of training is to update the weight of each connection in order to avoid errors in the future.

      Input layer is the first layer in a neural network, which receives input signals and passes them on to subsequent layers.

      Hidden (computational) layer applies various transformations to the input data.
      All neurons in the hidden layer are connected to every neuron in the next layer.

      Output layer is the last layer in the network, which receives data from the last hidden layer.
      With its help, we can get the desired number of values in the desired range.

      Weight represents the strength of the connection between neurons.
      For example, if the weight of the connection of nodes 1 and 3 is greater than nodes 2 and 3, this means that neuron 1 has a greater influence on neuron 3.
      A zero weight means that changes in the input will not affect the output.
      A negative weight indicates that increasing the input will decrease the output. Weight determines the influence of the input on the output.

      Forward propagation is the process of passing input values to a neural network and getting an output called the predicted value.
      When input values are passed to the first layer of a neural network, the process occurs without any operations.
      The second layer of the network takes the values of the first layer and after multiplication and activation operations, passes the values on.
      The same process occurs at deeper layers.

      Backpropagation.
      After forward propagation, we get a value called the predicted value.
      To calculate the error, we compare the predicted value with the actual value using a loss function.
      We can then calculate the derivative of the error value with respect to each weight in the neural network.
      The backpropagation method uses the rules of differential calculus.
      Gradients (derivatives of values errors) are calculated from the weight values of the last layer of the neural network (error signals propagate in the direction opposite to the forward propagation of signals) and are used to calculate the gradients of the layers.
      This process is repeated until the gradients of each weight in the neural network are calculated.
      The gradient value is then subtracted from the weight value to reduce the error value.
      This allows for minimal losses.

      Learning rate is a characteristic that is used during the training of neural networks.
      It determines how quickly the weight value will be updated during the backpropagation process.
      The learning rate should be high, but not too high, otherwise the algorithm will diverge.
      If the learning rate is too low, the algorithm will converge very slowly and get stuck in local minima.

      Convergence is a phenomenon when, during the iteration process, the output signal becomes closer and closer to a certain value.
      To avoid overfitting (problems working with new data due to high speed), regularization is used - reducing the complexity of the model while maintaining the parameters.
      This takes into account the loss and the weight vector (the vector of learned parameters in this algorithm).

      Data normalization is the process of changing the scale of one or more parameters in the range from 0 to 1.
      This method should be used if you do not know how your data is distributed.
      It can also speed up training.

      Fully connected layers - this means that the activity of all nodes in one layer passes to each node in the next.
      In this case, the layers will be fully connected.

      With the help of loss functions can calculate the error in a particular part of the training.
      This is the average of the function for training:

      - ‘mse’ – for squared error;
      - ‘binary_crossentropy’ – for binary logarithmic;
      - ‘categorical_crossentropy’ – for multiclass logarithmic.

      To update the weights, the model uses optimizers:

      - SGD (Stochastic Gradient Descent) for momentum optimization.
      - RMSprop – adaptive learning rate optimization according to Geoff Hinton’s method.
      - Adam – adaptive moment estimation, which also uses adaptive learning rate.

      To measure the performance of the neural network, performance metrics are used.
      Accuracy, loss, validation accuracy, score are just a few metrics.

      Batch size – the number of training examples per iteration.
      The larger the batch size, the more space will be needed.

      Number of epochs shows how many times the model is exposed to training.
      Epoch – one pass forward or backward for all training examples.









      3.4. The second stage of the functioning of the neural network - working stage, uses the METHOD of DEDUCTION.

      The DEDUCTIVE method is the derivation from the general to the particular, or, in relation to the NN, the execution of the NN modified activation function (-`template`) for new input data.
      The NN modified activation function (-`template`) was created earlier, by the INDUCTIVE empirical method in machine learning, while the NN modified activation function (-`template`) was `fitted` to the given training data.

      NN can reconstruct the original dataset (signal, image) for part of the information from noisy or damaged input data ((auto)associative memory).>
      This is the direct work of a neural network with input data to search for patterns, forecast, qualitative analysis, recognition, ... (or - the advancement of signals (information) through a layered set of coefficients of statistical `weights` of neurons, already configured in the process of training the NN, during which a comparison and change in the values of intermediate parameters of signals occurs (we can say - the information is `filtered`).

      From the point of view of the development of computing and programming, a neural network is a way to solve the problem of effective parallelism.



      3.5. Areas of application of artificially created neural networks in artificial intelligence systems:

      Recognition (of images, objects, etc.):

      ● visual - images (movable/stationary), textual, lexical, semantic (semantic);
      ● acoustic - conversational, musical, lexical, semantic (semantic);
      ● taste;
      ● olfactory;
      ● tactile - feeling of pain, temperature;
      ● sense of balance and position in space, acceleration, feeling of weight (analog-vestibular apparatus).

      ● Classifications — distribution of data by parameters.
      ● Decision-making and management.
      ● Clustering is the splitting of a set of input signals into classes, while neither the number nor the attributes of the classes are known in advance.
      ● Forecasting - the ability to generalize and identify hidden dependencies between input and output data.
      ● Approximations are any continuous function with some predetermined accuracy.
      ● Data compression and associative memory - identifying relationships between different parameters makes it possible to present data more compactly if the data are closely related.
      ● The reverse process is the restoration of the original data set in terms of information - called (auto) associative memory. Associative memory also allows you to restore the original signal/image from noisy/corrupted input data.
      ● Data analysis.
      ● Solutions of optimization problems.
      ● Finding patterns in large amounts of data.
      ● others...


      3.6. Artificially created neural networks do not belong to the philosophical concept `consciousness`:

"Consciousness".
The greatest mysteries: What is consciousness? (RU)
Quantum processes have an impact on consciousness. (RU)
What is the brain? A soul, a computer, or something more? (RU)
Scientists have discovered a key difference between the human brain and the animal brain. (RU)
People with enhanced intelligence can be more effective And. (RU)
The strange connection between the human mind and quantum physics. (RU)
What does quantum theory actually say about reality? (RU)
Neurons responsible for consciousness have been discovered. (RU)
SCIENTISTS CLAIM NEW METHOD CAN MEASURE CONSCIOUSNESS.
Neuropsychoanalysis: what it is and how it can change your life. (RU)
How the brain develops: a new way to shed light on cognition.
Anatomical connectivity influences both intra- and inter-brain synchronizations.
Social Neuro AI: Social Interaction as the "dark matter" of AI.
Luciano Floridi: «If you are not interested in informational concepts, you do not understand the 21st century». (RU)
Bernard Stigler: «Artificial intelligence is artificial stupidity». (RU)

































      3.7. Generative AI language models determine the probability of the next word by analyzing text data. They interpret this data by running it through an algorithm that establishes the rules of context in natural language. The model then applies these rules to language problems to accurately predict or generate new sentences.

      A Large Language Model (LLM) is a type of artificial intelligence (AI) algorithm that uses deep learning techniques and large data sets to understand, generalize, create, and predict new content.

      `The terrible truth` about LLM: in fact, ChatGPT is T9 from your phone, but on `bull steroids`!

      Yes, it is: scientists call both of these technologies `Language Models`; but all they are really doing is guessing what the next word should come after the existing text.

      AI chatbots and LLMs are nothing more than "glorious voice recorders," says Michio Kaku, professor of theoretical physics at the City College of New York and NYU Alumni Center.

      These AI models use generative methods and are therefore called generative. They pay too much attention to detail instead of capturing general concepts and this is their weak point, which often leads to the issuance of false information.




      3.8. Predictive models of AI architectures - based on a complex polynomial that describes the response surface of the model parameters or, in other words, is a substitution ("black box") of existing data or a calculation model.

      (A polynomial is an algebraic expression representing the sum or difference of several monomials: a polynomial. A monomial is an algebraic expression consisting of the product of a numerical factor (coefficient) by one or more variables, where each variable is raised to a natural power).

      Predictive modeling is based on the construction, management and calculation of models using approximation techniques. They are also called response surfaces, surrogate models, metamodels, reduced order models, etc.

      What does the word predictive mean? - Associated with predicting the future behavior of someone, something; predictive, prognostic, ...

      Predictive analytics is a set of methods for analyzing data and how to interpret it, allowing you to make successful decisions in the future based on the results of past events. In order to cope with the implementation of analytical work of this order, the specialist should identify a set of important, significant parameters, each of which really leads to a particular outcome.

      Some TRIZ methods and tools also use predictive modeling and analysis, which, perhaps, will allow them to be used in AI systems. These are TRIZ methods that allow constructing causal-consequence "hypotheses" (through `Production Rules`) about the environment, taking into account the accumulated knowledge in the area under research.









      3.9. Modern models of AI systems are analogs of `Goldberg machines`.

      Employees of companies that are leaders in AI development, they know something about their work that no one else in the world knows.
      They have significant proprietary information about the capabilities and limitations of AI systems.

      They know the “open secret” about the NN:
      All these NNs are nothing more than a set of SELF-ADJUSTED (= `self-learning', when performing the task of minimizing the target error function of the NN), working in parallel `Goldberg machines`, having many interlayer and feedback connections between identical nodes (`neurons`), making up a certain configuration of the NN, which are capable of fixing the selected parameter of their current state (`remembering' it).

      Such “Goldberg machines” can be created on almost any known physical or chemical principles of operation.
      Basically, this is a microelectronic base, and in the future - quantum principles of NN operation.

      The promises of some developers of AI systems that the intellectual products of artificial NN, and these are generalized complex modified activation functions (-`templates`), when scaling the NN, will suddenly move from imitation of `consciousness` to real `self-consciousness` and will acquire `free will` are not very plausible.

      These myths about the possibility of acquiring `artificial consciousness` by electronic (or quantum) `Goldberg machines` are supported by so that the capitalization of companies involved in AI systems grows by leaps and bounds.
      Investors demand this, otherwise the inflated “financial bubble” will burst.

      As for routine operations in various spheres of human activity, then, yes, here electronic “Goldberg machines” work quite successfully.
      They speed up various production and other processes, gradually replacing human work.
      In these areas, the practical benefits of AI systems are undeniable.

      Companies working in the field of automation of various processes have always faced the issue of investment - where to find investors who would allocate huge funds for the development of dubious, risky, unprofitable, during initial implementation, automation (robotics) of production, as well as other processes?

      What was needed was “bait” in the form of AI systems that would have “consciousness”. And many investors fell for it.
      Investors were given a dream, a hope that AI systems would reach an artificial level: `mind`, `awareness`, `consciousness`, which will completely replace man in all spheres, especially in scientific and creative ones, will accelerate scientific and technological progress, and will achieve technological superiority in a competitive environment, during the transition, through the so-called point or moment `technological singularity`.

      But it seems that the “technological singularity” on the current principles of operation of AI systems (electronic (or quantum) “Goldberg machines”) will not be achieved, and the “financial bubble” will burst.
      AI systems, built in the form of electronic or quantum “Goldberg machines”, in tests similar to the Turing test, produce, and will produce even better, another “intellectual product” - an IMITATION OF CONSCIOUSNESS, but do not AWARE of themselves as an independently existing living organism - individual.

      The most advanced Large Language Model (LLM) AIs still lack basic reasoning skills, and are therefore not as useful as their creators claim.

      ... `Reasoning makes a conclusion, but does not make the conclusion certain, unless the mind discovers it by experience.` - Roger Bacon (1214-1292) ...

      At the same time, developments in automation (roboticization) of processes have already been largely developed, tested and embodied in “metal”, a technological leap (another technological revolution) is being realized, which is becoming a positive aspect of this whole boom associated with AI systems (electronic (or quantum) `Goldberg machines`).
      Most of the `AI coms` (`ai.com`, `ai.ai`, ...) will collapse, go bankrupt, and the ones most adapted to the put forward practical requirements will `survive'.

______

      Modern AI systems resemble the nervous system of jellyfish, which physically does not have "association centers".
      The main difference between AI systems and jellyfish is a significantly increased number of neurons, which may contribute to the creation of not physical, but virtual "machines" (areas) of the NS that perform functions to simulate "association centers".
      Jellyfish do not have a brain or spinal cord in the usual sense, like humans or other animals.
      Instead of a brain, they have a ring of ganglia located at the edge of the umbrella (clusters of neurons connected to the creatures' eye-like structures known as rhopalia) that act as visual processing centers.
      This ring gives rise to branches of nerve cells that transmit signals to muscles, allowing the jellyfish to move and respond to stimuli.
      But, it turns out, jellyfish are capable of behaviors you wouldn't expect to see in creatures with no central nervous system (i.e., no "association centers").
      Researchers have found that these creatures are able to learn from previous experience.
      These animals are `aware` (to the extent that they are capable of it, of course) of the conditions in which they find themselves, as well as the presence of possible predators.
      A team of scientists discovered that cnidarians are capable of operative conditioning.
      The animal has a relatively complex visual system: 24 "eyes" - 16 photoreceptors and 8 true eyes. They are located around the circumference of the body, and the jellyfish uses each of the visual organs.
      This is a type of associative learning, in which the organism remembers the consequences of its own correct or incorrect behavior (in the struggle for survival) in the past and when the situation is repeated, it corrects the behavior (adapts).
      The ability to operant conditioning is inherent in bilateria, for example, arthropods, mollusks and vertebrates.
      But, for the first time, researchers have managed to prove that "lower" animals (without a brain or spinal cord, in the usual sense, that is, without "associative centers") also capable of associative learning.



     3.10. Note from Harutyun Avetisyan about the current `weak` AI, based on experience (according to Kant), and not on reason.

      ... But not so long ago, in 2011–2012, there was, one might say, an `explosion`, when more than 10 million images were made publicly available.
      And on their basis, more serious models were trained that solved image recognition problems.
      They could, for example, distinguish a cat from a dog with such accuracy that was previously unattainable.
      It turned out that if there is a lot of data and supercomputer power, then very serious results can be achieved using far from the most advanced mathematical methods.
      And now, we live in the world of generative artificial intelligence, so-called large language models, large fundamental models.
      And, of course, in this sense a lot has changed.
      But everything we have is somehow connected with machine learning and neural networks (`weak` AI). ...

      ... Immanuel Kant said that all knowledge begins with experience.
      Moreover, experience will never guarantee true universality.
      Thus, he puts some kind of restriction.
      And if we take his main works, he believed that
one of the main properties of MIND is the operation of a priori knowledge.

      What is a priori knowledge?
      This is knowledge independent of experience. ...
      !!! ... there is no knowledge that is unconditionally independent of experience in modern `weak` artificial intelligence. !!!


      If we assume that weak AI is something that is based on experience, and strong AI is based on MIND, in this sense, according to Kant, we are still a long way from strong artificial intelligence far. ...
      ... There is no a priori knowledge in modern artificial intelligence, which means there is a divide between them that cannot be overcome.
      They may appear because these tasks are being addressed, the approaches will be incorporated into existing solutions, and in the future the emergence of strong AI is possible. ...
      ... But in the long term of my life this will not happen.
      Therefore, I concentrated on technologies based on experience, that is, weak artificial intelligence.
      It has its own problems, its own issues of security, trust, etc....
___________________

  




www.onliner.by www.kufar.by www.tut.by в
Белорусский портал, Новости Беларуси и мира, статьи, комментарии, почта, каталоги, форум, финансы, афиша, работа www.av.by
Private Trade Unitary Enterprise "BELSATplus". Certificate of registration No. 190991566.
Issued by MGIK, date of issue 19.03.2008    UNP 190991566.    Information about the retail trade facility is included in the Trade Register of the Republic of Belarus.

Address:

Private Trade Unitary Enterprise
"BELSATplus"
3 Kozlova Street, of 3
MINSK
220005
Republic of Belarus

Write to email
Landline Telephone:   +375 (17) 35-49-777.
A1     +375 (44) 5-8888-77.
Life   +375 (25) 7-88888-7.
MTS   +375 (29) 752-44-78.

 
Copyright (b-b)© 2005 by (b-b)
PHP-Nuke Copyright © 2005 by Francisco Burzi. This is free software, and you may redistribute it under the GPL. PHP-Nuke comes with absolutely no warranty, for details, see the license.
Opening the page: 0.37 seconds

The Russian localization - project Rus-PhpNuke.com