Why does our AI seem fake?

Excerpt from the book “The algorithm of the universe (A new perspective to cognitive AI)

In the current world of computer science the term “machine learning” has been used in varied sense. Learning is a loosely used term that is applied to many types of algorithms. The learning itself can be “learning weights and bias to be associated with already form fitted mathematical functions” using linear regression, logical regression, multivariate regression. Or “learning a smoothening function for a bunch of data points” using Bayesian algorithms. Or “learning how to cluster data points with weights” using algorithms such as k-means clustering. Or “learning how to classify data points” using the k-nearest neighbor algorithms. Or various neural network based algorithms where states are represented as mathematical functions learnt as a part of a multi-depth neural nodes allowing modified input data to pass through and hence achieving deep-learning. Many other such small and big algorithms are termed as learning. Further the types of learning vary anywhere from supervised learning to semi-supervised to unsupervised to reinforced learning. In a supervised learning the input and output data are forced on the algorithm to ensure that the learning algorithm adjusts itself to the expected output, while in a unsupervised the input data is learnt from and the output not controlled. In a reinforced learning the output is rewarded or penalized to adjust to the required values appropriately. But, in all of these type of learnings, we have moulded the output like a clay toy to give a certain output that is inline with what we or an expert in a domain thinks of as the intelligence required of the algorithm. But, is that true intelligence? Can intelligence be really judged to be correct or wrong? Can there be a wrong intelligence?

We live in a world where these algorithms have surprised us at every step. Even humans cannot achieve a few things these AI algorithms achieve. For example, many studies have been done where it is determined that an AI algorithm can very easily fool the captcha algorithms that is used to determine if a human is using the system or if a computer robot is using the system. I find that the more difficult the captcha is made, the more easy it seems for an AI to pass rather than the human. Humans are equipped with ever degrading sense organs such as eyes and ears to clearly identify unclear alphabets. Moreover, with every variation in the letter rendering or picture rendering, the approximation modelling present in humans gives back a number of matching scenarios, that choosing the best match is just a matter of opinion anymore, rather than the correct solution. The simple mathematical computations are best done with computers than human mind which is the whole point of inventing computers. I would tend to also say, a computer generated picture or sound is best understood by a computer rather than a human/ Using it to determine if someone accessing it is a human or not is pretty foolish when a computer can do a better job of it. If we did use it, possibly we need to validate for a negative scenario. Verifying if a human mistake is made, rather than get it correct. The below cartoon in geek-and-poke says it as it is:

Another example where these algorithms have surprised us, is the storing of user’s browsing history and using it to provide suggestions in terms of advertising for various products to be bought or actions to be taken. Sure, a computer is better at this than any human. We humans are definitely not very good at trying to track each and every action of another person and act according to that person’s interests. We always only do it best when it is in our own interest. While we can say that a computer devoid of emotions is better at looking at browsing history objectively to spit out advertisements, it is very doubtful if I agree that it does a good job of it. When we look at the various suggestions and advertisements that pop up while browsing, it does make us smile or get irritated based on what is being advertised and our mood at that given point in time. Why would I want to buy a washing machine twice in a row or a fridge twice in a row even if they are of different brands. Sadly this is what is advertised. The best is when amazon suggests me to buy my redmi phone along with a carbon phone and a gorilla glass for the carbon phone. Seriously, why would I buy two phones together? While it is uncanny as to how the system suggests what to buy or not, the algorithm is purely permutations and combinations of words without too much of meaning associated with it. So, while the algorithms themselves seem to be very intelligent in knowing what I have done, it is only because we do not expect it of the computer programs that it surprises us.

Yet another example of such algorithms can be seen in the voice processing systems that are integrated into the google maps and directions. Again it is uncanny to see the number of suggestions that are auto suggested by google as you are travelling. But, many a funny episodes have also occurred when we see old people or children interact with the voice system under the assumption that they are as intelligent as a human. The woeful lack of understanding of even the basics of a language as a colloquial version, is visible when we see a voice based system try to interpret the various different ways in which the same intent can be conveyed. When we talk, we rarely try to be grammatically correct or even syntactically correct by pausing and sighing at various words. We only try to convey meanings to the listener. But, this is definitely not understood by the systems, because they are coded for the formal language usage. These systems only look for key words or are coded for a specific sequence or grammar or some standard pronunciation of the words that any variations from that give rise to the funny interpretations and situations.

Yet another example is the various virtual reality games that try to mimic real life. While they are immersive and addictive in nature, it should be understood that it is still only the addiction that gets us to fool ourselves into thinking them to be real or immersive.

Humans are highly adept at remembering that which is pertinent to their current situation. From a humungous expanse of disparate data that has been collected over our lifespan we can recollect the appropriate required information that applies to the situation we are facing to get ourselves out of or react to the situation. This is like the cocktail party effect. We can easily attend a party and ignore all small talk around us until someone mentions our name. Then the whole focus changes to the person talking about us and we are able to easily switch the conversation to the relevant topic. Small irrelevant words that makes no sense out of context, actually makes sense to us based on the person talking about it and we are able to pickup and continue a chat as if it were a middle of some conversation started much earlier.

But, “a focused clear-cut collection” related to a specific search topic is purely a computer algorithm. As humans we can neither be limited to a single topic nor can we remember all the details of the single event. We always tend to associate it with our impressions and it is the impressions that trigger the recollection of the details of the event. So while a designer-designed “Turing test” can be applied to a focused area of intelligence and the computer can be made to randomize appropriately between what is considered to be right and wrong to pass the test and successfully to imitate a human, whether the test will really be sufficient to say whether we are interacting with a natural being or not is highly debatable. It is like the saying “You can fool some of the people some of the time, you can fool all of the people some of the time, but you cannot fool all of the people all of the time”. It just needs time for a person to adjust to the reality of the intelligence learnt by a computer to be able to pin-point that it is a computer and not human they are talking to.

To even start thinking of creating a truly intelligent system, we need to ask ourselves “Why does our AI seem fake to us?” How does the brain know that we have mimicked an intelligence and it is not the same as the intelligence created by this natural world around us? How do we know that it is a virtual simulation? To understand this we need to understand the in-depth working of reality and start understanding the difference between the way reality works and the way our AI algorithms work.

The limitations of AI learning algorithms

There are dime and dozen websites that explain artificial intelligence, machine learning, classifiers, clustering and many standard algorithms that are classified as learning algorithms. So, I will not go into detail explaining how they all work here, but only touch on the limitations of the learning algorithms that we have.  

As I have said, in this book I have purely focused on comparing neural networks and the related learning algorithms against the algorithm present in the internals of the working of reality. Mostly because, in my view all other algorithms does not even come close to being a learning algorithm, let alone being a “truly AI algorithm”. They are just mathematical functions and computations. Sure, they do learn the related weights, bias and many such parameters. But, that is not knowledge or intelligence, that is purely “a search for the most suitable set of constants that fit the given discrete data points to achieve the goal aimed for”. A smoothening curve-fitting algorithm is hardly a AI. It is purely an algorithm that decides on the right mathematical function that fits the maximum number of discrete data points input, ignoring others as outliers. Clustering algorithms could have come closer to having knowledge and intelligence, had the algorithm been different. But then, most clustering algorithms also use weights and biases associated to node values to determine some form of mathematical distance between node values. Then based on these distances the nodes are clustered or grouped. Hence, these also fall under the category of just mathematical computation that groups based on the best possible group of values, hardly can be considered as AI. Yes a very specific focused intelligence, but then the intelligence is in the logic that has been coded rather than the logic being learnt. The “so-called learning” in any of these algorithms is very limited and the “knowledge learnt” has some pre-defined meaning, based on which logic is applied.

Neural networks are algorithms designed to try and mimic the neurons of the brain. Multiple nodes called neurons are created in the learning phase with each neuron associated with a propagation or activation function. The inputs to the neurons are weighed down by weights and the inputs biased by some bias. The weights, bias and activation functions can be learnt during the training phase. Layers containing many neurons are then created and connected into different types of graphs. The layers and the graphs also learnt during the learning phase. Based on the weights, bias and activation function of the neurons, the logic traverses to the next layer of neurons and controls the input that traverses to the next layer. Thus traversing via various neurons in the different layers of the network an output is generated. This can be seen as just a glorified, gigantic mass of multi-level if-else clauses. The “if-else” driven by weights, bias and activation functions instead of a binary ‘1’ or ‘0’. The if-else is learnt based on the input data training given to the neural network.

In my view, neural networks also tend towards a very limited “knowledge learnt” and has some pre-defined meaning that is coded into the logic written. We can call what is learnt as neurons or give it any other fancy names. They are still weights of input values, biases associated with some summation of the input values or activation mathematical functions. Thus, the knowledge learnt is limited. But, neural networks atleast learn varying levels of “if-else clauses” which comes closer to one form of intelligence. When the number of nested levels is high, even an if-else clause can achieve seemingly intelligent decisions, because the combination of paths present increases. But, again it should be understood that it is still just a finite number of paths followed. It seems complex to us as humans, because our mind boggles at that level of nesting and the combinations of paths possible. Typically, when it comes to that level of complexity, humans tend to usually reduce down the complexity and use a different algorithm. I find that the fact that the neural network does not optimize it down at those levels itself is indicative of it not being intelligent, but being just a tightly coupled logic coded by a machine rather than by hand.

So, what do I mean by “knowledge learnt” is very limited and has pre-defined meanings. Let us take an example. Let’s take for example a dog. I will tell the story of a litter of puppies that were born to a street dog born near my home. My front gate has a hinge that leaves a bigger space between the wall and the hinge. The space is enough to allow a puppy to easily squeeze into the portico area. We typically leave our footwear outside at this location for many useful reasons, let’s not go there. As all of us know dogs and puppies, especially the growing ones, like to chew on slippers. So, it happened, one day I found that my slipper was dragged outside the house and chewed on completely. Obviously ruining it. Now, I had to prevent the puppy from getting in. I did not want to invest in a permanent solution of any sort, so I like a fool I went to pit my intelligence against the intelligence of a natural instinct of the puppy. I have few potted plants growing in the front of my house. So, the easiest solution that occurred to me was to just pull out these pots in front of the gap so that they block the access to the gaps. Imagine my surprise when I found that the puppies were able to still get in. So, I watched to find out how they got in. I found that given that the pot had a narrower bottom than the top, they just squeezed in the gap between the narrow base and the gate and were able to do the gymnastics to get to the gap in the gate. Now, I said, that is bad. So I got a few bricks and blocked the bottom of the gap using these bricks. Yet again they were able to get in. I found they just jumped over the brick. They would just squeeze through the gap of the pot, jump over the brick by using their front paws to hold onto the top and get in. I tried with two bricks stacked one over the other. To my surprise, they just would push down the top brick and get in. It went on in this manner, till I found a solution that just prevented them from either pushing and getting in or squeezing in through some open gap that I had left. Took me sometime to get to the solution.

Let’s leave the part of my bruised ego and grudging admiration and appreciation for life’s intelligence. But, the purpose of this story is to show that “the same knowledge” can generate “multiple intelligence”. This means that knowledge needs to have the inherent capability to adapt to varying situations. It needs to be elastic. If we look at the story I said, the dog had only two pieces of knowledge. To understand the space into which it can squeeze or crawl into and to understand its ability to jump over obstacles. It knows exactly when to use what to achieve its goal and that is intelligence. The admirability of the way the puppy was able to apply the knowledge shows how the knowledge can vary continuously in all directions without any restriction.

In neural networks that we build, we just vary weights associated with inputs and biases of some summation of the inputs to give prominence to one input over the other. One neuron can only be used rigidly for some matching inputs or a range of matching inputs based on how it is coded. It cannot be applied to what is called “relevant situation”. A “relevant situation” needs to have more neurons in the if-else clause. So, the variability achieved by that single node that stores the logic is very rigid having only a limited match to which it can apply. Hence, I say neural networks have very limited learning capability.

What are we missing?

The deep learning algorithms are touted as algorithms based on how the neurons in our brains work and can learn a whole lot of knowledge. So, why don’t these mimic reality and how can we recognize these to be fake easily?

The simple fact is that we approach AI as any other experiment in physics or chemistry, where a given problem is taken and stripped down to a controlled environment and the concept tested. Subsequently just the concept is expanded to be applied as a technology to achieve some set of controlled functions. For example, we conduct an experiment on a specific material such as copper to understand that it can conduct electricity or a specific material such as silicon to understand that it can be used as a semi-conductor, subsequently we apply the experimented and proven principle to create a conduction wire for electricity or a transistor which are for very specific applications of the concept. Silicon as such can be a semi-conductor or emit photoelectrons that can be used to capture light images. But this concept was discovered later after semi-conductor was used to create a transistor. We tend to step very carefully, experimenting at every step first describing a concept, proving to ourselves that it can be repeatable and then subsequently applying the specific or a combination of known concepts to create a technology. The aim of such a type of study is to be able to describe using the language that we have defined, what is happening and to be able to use the concept for achieving other purposes using some form of intelligence or logic only we can understand.

Knowledge in nature does not work in that manner. Nature, we find always has the entire continuity present. The whole gamut is present in nature, it is just our observation that limits the knowledge. We can see it in all the ways we apply natural knowledge or intelligence. We should not confuse application of knowledge to description of the knowledge. What we do in science is describe the knowledge, while in normal day-to-day life we actually apply the knowledge without describing and sometimes not even knowing how we are doing it. I do not believe everyone needs to know how and why we can walk, but we walk. I do not believe anyone needs to know how and why we can hear, talk, see, sense touch. But, we can sense and create images, listen to songs and many such tasks. Nature operates on reducing from a continuous whole to a reduced applicable whole, while in experiments we create a small reduced part and try to pick that and expand it to a continuous whole. While this can give results in some cases, in the field of knowledge and intelligence it introduces many problems.

So, why cannot we reduce the amount of knowledge and try to create intelligence out of it? This is because, we have no clue as to what knowledge goes to making up a certain intelligence. For example, we believe that we have decoded the working of our sight by understanding the way our eyes work. Our explanation says that light gets focused by the lens in ours eyes and falls on the rods and cones of the retina which then gets converted to electrical signals and processed by the brain to form the image. The question now is, if a camera is modelled on this concept, then why does not a picture capture all the aspects of the image that is created by our brains? Definitely everyone knows that seeing something actually using the eyes is not the same as seeing a camera captured picture. This can best be realized when we see the photograph of the grand canyon vs when we are actually standing at the precipice of the Grand Canyon’s lip looking down into the depths. Even if we forget the associated experience, just the image formed in our brains of actually seeing has much more details than a picture. Why is this so? The only explanation is that we have not understood all the concepts involved in formation of the image in the brain. Sure, we have created many technologies such as IMAX, 3D movies with a flying plane going close to the Grand Canyon depths. But, yet the real experience and image formed is much different. We can very easily say that the image in our brains is affected by the experience of all the other sense organs. Isn’t that what the IMAX and 3D or 4D movies are trying to provide? While it does give the thrill and experience, it is still different from the surreal experience of seeing the Grand Canyon. So, the question is it only that much? But, even without answering the question, just trying to recreate the base description of the working of the eye using a camera and it not sufficing supports the theory that we cannot reduce knowledge down into a controlled environment. Obviously we found that experience makes a difference to the way image is created.

In the current AI systems, it does not matter whether it is deep learning, machine learning, neural networks or any such technology that we create, we have reduced down and limited the scope of visibility of data available to be learnt from to a very small part of even the domain that is being learnt about, let alone other related domains. For example, if we wanted to create routes for transportation of goods from one place to another, we could just look at the distance travelled, cost and many such apparent parameters to create the route. But, typically when a human creates from experience, many parameters go into creating the route. Is the route going through a forest area? Is the route having good roads? How many twists and turns need to be taken to go through a route and many many such parameters that make sense only to us in the decision making.

For example, while the shortest route could be through city roads, when travelling from one location to another, we just may prefer to take a tolled road that is more than double the distance and costlier option between the locations. Such a decision can be based on the fact that tolled roads typically has no signals, no turns, has higher speed limits and with lesser interruptions which allow us to reach in half the time it takes to travel the shorter distance via city roads. Just the wait times at signals may take more than the time to travel the longer distance. But many a time, we also base the decision on the time and day of travel to take the decision. Thus, we see that restricting the amount of data available to the AI based on some expert’s opinion just ties the algorithm to a single person’s impression or understanding of the problem and there in starts the whole problem of our AI.

This lack of consideration of all the information that makes a difference in the decision taken appears as a lack of common sense, where we see obvious decisions not taken. This is one of the many factors that make us realize that the intelligence is from a coded system which cannot think for itself, and is just picking a single possibility from one set of permutations and combinations available in a million other permutation and combinations available.

Let us take another very simple example. Based on the experience we have of reading many emails over a span of many years, we can very easily manually identify when an email is a spam. Say we wanted to create a program that recognizes spam messages in the email. We can hand-code a logic or create a learning system. When we hand code it, we code it as a rule based system that runs rules that a user sets up to mark messages as spam. Here, we have let the user use their experience to specify how a spam is recognized. What will the user do? They would look at a message that they think is spam, recognize word occurrences in the subject, word occurrences in the message of the email body, the from address and so on many parameters in the email and setup rules that match the observations made by the user. The system now runs these rules as an if-else clause when messages arrive and classifies them as spam.

Now, let’s say we want to create a learning algorithm based spam detector. What do we do? We still create it as an if-else clause, because there is no other way in computers to recognize this. We can either ask users to mark a spam message as and when they see an email that is spam in their inbox. As many people keep marking spam, we have created an algorithm that has taken the common parameters of the emails such as subject, message body, from address and so on and based on the recognized commonality between the messages, we learn a set of rules to run, to mark the message a spam. In such a case obviously we can take the input and experience of many users into consideration to mark spam, which can improve the effectiveness. We could have coded in patterns of occurrences of various words, checked for presence of certain words in certain format, we could have added distance between words and many such logic. But, they all translate to the same algorithm. A set of rules, either hand coded or machine learnt or assisted machine learnt. So, what is the difference between what we hand-coded and that which is a learning algorithm? It is the same if-else clause, one hand-coded, one machine coded.

Thus, for us a learning algorithm, is just a way to “machine code those scenarios which we cannot think upfront”. Rather than create real knowledge from data and use that knowledge to create intelligence to understand what spam email really means and hence classify spam emails, we are just using “learning algorithm” as a way to machine code the same algorithm we would have hand-coded. In-fact, we do not recognize “knowledge present in data” different from intelligence at all. For our AI knowledge and intelligence are the same. The program that is created from such assumptions is obviously just another computing program rather than being a truly intelligent system. So, is the algorithm truly a learning algorithm at all?

Learning a logic in data rather than learning a skill

When we look at how reality works, we find that living beings can start living without much training. They train on the job, so to say. Very small amount of data gives huge learnings. So, how is it that even with terabytes and terabytes of data that we have, we cannot train in a logic that is no-better than a gigantic if-else clause? The if-else weighed down by weights and biases, rewards and penalties which are pre-coded by so-called learning algorithms?

We may call it a learning algorithm. But, we need to accept that what is present is just another logic that is coded in the form of learning algorithm which auto-codes rather than creating a skill. This is just another way of “coding the if-else program”. It does not matter if it is hand coded in the form of actual “if-else code” or “goto” or via “look ups in hashmaps” or learnt to be in the form of weighted input to nodes that run activation functions in multiple layers. They are all pre-coded if-else clauses in multiple forms which run on some data that is input to give an output. Why should it yield a different result as opposed to a hand coded algorithm which is coded specifically for a given data type?

Let us an example. The market which every auto-company is trying to capture, autonomous car or a self-driving car. Let us forget about a learning system that is being added to the car and think about how we would have coded the system if we were able to fix all the parameters and conditions of driving as the contours of the road, the inclines and declines, the road being empty and so on and had to hand-code it. Looking at this, much reduced problem, we find that, if we could access the car controls and accelerate the car to a required speed such as 80kms/hr, and left the steering wheel so that the wheels are oriented parallel to the curb, it should have followed the road contours if the road had been banked correctly for turns. Subsequently, we just had to accelerate and decelerate for the inclines and declines to maintain the speed of the car. But, given that roads in the real world cannot be banked so accurately, we will have to adjust the steering wheel to adjust the car wheels as it veers off the course by some recognizable deviation.

To code this, we would have created a space representation of the 3D space with the road contours and all other features on the road sides coded into the 3D space object. We would have followed the current trajectory of travel of the car in that 3D space w.r.t the road contour. Project the trajectory of the car if it travelled at the given wheel orientation and speed. Computed the change required in the wheel orientation to follow the road contour if there is an non-acceptable deviation. If a change is required to the trajectory, we would have changed the steering wheel by the required amount so the trajectory of the car matches the road contour for the next 10s (we would have made this configurable). If there is a decrease in speed of the car, we would have computed acceleration required and if there is a increase in speed of car computed the deceleration required. Controlled the car accelerator to change it as computed. This we would have just been repeated till it reached its destination.

Now, if we take off the controlled parameters that we had put such as an empty road, known road contour etc., how would the program change? Since, we do not know the road contours, we will have to detect the road contours. Since we do not know the inclines and declines, we need to detect it. Since the road cannot be empty, we need to detect the presence of obstacles in the path. If we added the sensors for these, the same logic can work, except in the case of road contour rather than comparing it with existing contours, we need to compare it against the detected road contours for the next few minutes. Obviously, we need to have detected the road contour atleast a few minutes before to be able to act in real-time. The other change required would be to detect obstacles in the path and take appropriate action, go around, slow down to match speed and so on.

Obviously this is a humungous hand-coded logic that keeps changing with every situation we need to code in. We introduce traffic lights, we add a sensor for detecting red-light vs green light and code in the logic, we introduce junctions, we add sensors for detecting cross-traffic, we code in the logic to compute car trajectories and intersecting trajectories and take decisions, we add in left and right turns, we add in sensors to detect when the turn is coming and add in the trajectory calculation to change the steering wheel correctly. But, even if it is humungous, the fact remains that the logic can be hand-coded correctly if the sensors had been the correct ones and the number of resources available to hand code were also as humungous. The obvious problems with this is that, with the number of situations and parameters we add in, we find that more and more logic needs to be hand-coded to handle the situation. And we all agree that the number of situations that can occur when we drive a car is nearer to infinite rather than finite and all scenarios cannot be hand-coded easily. It will make the code unwieldy and really complex to even start understanding, let alone maintaining, debugging and fixing bugs in it.

Hence comes the situation where we want the car to learn on its own how to drive. But, what we seem to have is a learning algorithm that learns what we are hand-coding and take decisions. The logic need not be what I have described previously, but some machine coded logic based on the data. This is typically what is done with the current neural networks. A set of scenarios are learnt by the neural networks based on some training data that has been fed into the system possibly by a human driving the car and feeding the input to the algorithm and / or feeding road details into the algorithm and many such parameters being fed into the system. But, the fact remains that this is also just “machine coding” the complexity into the network and letting that replay itself for actual situations.

This is not truly a learning algorithm for “learning to drive a car”, this is a “learning algorithm for the logic that is present in the training data and coding that logic into an network”. All this does is controls and takes decisions that we have taken when we created the training data by driving the car. This is where the problem lies. We need a learning algorithm to learn to drive a car, not to recognize and learn to code the rules of driving. The rules of driving is a secondary knowledge to the primary knowledge of driving.

What will be a truly intelligent system is not such a system with a learning algorithm that is pre-trained for decision making and controlling a car in given set of X situations based on some driving rules present in the learning data. This can possibly work in countries where drivers tend to follow the lane system and road ethics to the T, hopefully, and the traffic is streamlined within lanes and so on. But, for example in India, drivers do not follow lane systems or have any streamlined way of driving. Many a time, drivers do not even indicate they are turning, yet we do know they intend to turn and slow down for them. This kind of driving pattern can be driven by many factors such as the lack of infrastructure or the lack of space for the volume of cars or plain simply apathy of drivers, as in Bangalore. But, the fact is that a well-coded driving pattern learnt from any training data, cannot work in such a haphazard traffic condition. Here there is no situation which is similar to any other situation. All are unique by themselves and we need to take decision based on real time data. The only rule here is “there are no rules”. This is the situation where skill is absolutely necessary to get an autonomous car to run correctly. What is required here is a real-time learning algorithm, not a pre-learnt set of rules based on which decisions can be taken. So, what is a real-time algorithm?

What I have described in the above examples are offline learning algorithms where the machine learns and stores a logic from some training data. But, real time learning algorithms need to learn on the fly, from real-time data. In an offline learning algorithm there is a training data which pre-programs the entity with a certain “learnt logic” and subsequently any other input data given is only used to generate an output based on the previously learnt logic. In a real-time learning algorithm the most relevant learnt logic needs to be adapted and applied to the real-time data. The reactions and responses due to the adapted and applied solution is taken as a learning to modify the logic present, in real-time and reused when the next relevant situation that occurs.

The requirements of such an algorithm is different from that of an offline algorithm. In a real-time algorithm world, data is streamed continuously. All data is not available at once to be sliced and diced and re-used and re-programmed according to the requirements. Information required has to be extracted from the data and computations done as the data flows through the different states of the algorithm.  It further has to be fast enough to work with a throughput that is comparable to the rate of flow of data. In such algorithms having complex computations during the flow makes it slow. Having computations across a large set of data also reduces throughput.

It is also required to be able to search and locate relevant information rather than accurate or mathematical match for situations. In such a search and locate logic, adaptability of information becomes very important to apply to the given real-time parameters. The current learning algorithms have little or no adaptability present of the learnt logic. For example in the car driving example, let’s say the algorithm had learnt the logic to project the trajectory of a second car travelling in the same direction as itself, based on its speed and wheel orientation, subsequently detect the position where its own trajectory will intersect with the second car’s trajectory to take a decision whether to change langes. Now, if it encounters a situation (not learnt) where it finds a car is crossing over from the on-coming traffic lane into its lane and is coming in a head-on collision course, it should be able to pull up the lane change algorithm as the most relevant logic, compute the required trajectories of both the second car and itself and adapt the decision taken to change lanes to prevent an accident. Take the output of such a decision and put it back into its own algorithm as a learning. This is a real-time learning and this is learning the skill to drive a car rather than the logic or rules present in data. It should be seen that skill accumulates over a period of time rather than staying put at the same level till the next training with more data.

Rigid, digital data representation

It is strange that we have terabytes and terabytes of data in each and every domain of expertise and yet we are not able to create a single intelligent being even a single domain that can handle all the nuances of that domain. With all the processing power present and terabytes of data present we are only able to train a very narrow intelligence even within a single domain. What is even stranger is, while there are terabytes of data, they encompass less information by multiple orders. Our representation of data and information is bloated by many orders of magnitude. Even in this paragraph to convey a single point I find I have a need to write multiple sentences. Even if we represented each character of a sentence with a single byte, we find that to convey a single piece of information i.e., to simply say “data representation is complex”, I need 29bytes of data.

But every potential particle in nature seems to embody infinite possibilities of data. It is time we started questioning how we represent data. We do not have a concept of what I call true “quantum data” where data can assume “any information that an observer chooses to give it” and until such time, the data remains indeterminate. We need to look at “a quantum particle” in nature with this perspective. While the scientists say that a quantum particle exists in all states unless observed, at which time it assumes a single state, what we need to ask is what does “all states” really mean? A different perspective of this should be that “a quantum particle” does not exist at all or “exists as indeterminate” until an observer observes it. The state where “nothing exists” is the state of shunya and can be seen as “data with infinite possible qualities”. Here, the association of qualities is to the observer rather than the data. Hence, the data itself can appear in different forms to different observers [I have explained more related to this in Chapter 3 in the section on “Effective data representation for the problem at hand”].

In the computer world, the data that we create is tightly coupled to just one property, namely a binary representation of ‘1’ or ‘0’ which inherently has no meaning w.r.t the data we want to store. Hence to represent any arbitrary quality we need to combine multiple of these binary representations and associate a meaning to that representation. This representation is not a direct translation of what the quality is. It is a direct representation only of mathematical value. Hence behaviors of a quality that is not mathematical in nature needs to be mimicked to achieve anything in computers.   

Moreover, data is represented as discrete rather than an continuous analog representation. When we talk about analog data, we cannot use individual data representation as we do in discrete data. For example, when we look at a sine wave, when we have a discrete representation of data, we would have taken samples of the sine wave at regular intervals, say every 15o, so sin(0 o), sin(15 o), sin(30 o) and so on and stored it. Now, if we wanted to shift this to an analog data we cannot sample it. It becomes continuous data. So, how should the data be stored? We need to store it directly as a pattern, only then the continuity is maintained. So, we can very easily get sin(15 o) as well as sin(15.00001 o) as well as sin(15.19000 o). But when we start representing data in this manner i.e., as an analog value, then we start having problems with data computations and logic, because what we need here is a pattern algorithm rather than a mathematical computation.

All the logic we write is oriented towards a digital representation of data. For example, the collection functions such as sort, search and so on are dependant on the fact that data points are available and are not patterns. The learning algorithms that we write are based on the fact that data points are streamed, hence we need to do clustering logic, regression logic and so on. In-fact, if we look at mathematics, we model functions to be dependant on variables which we assume are discrete in nature and computed only for specific values. We do not have the concept of analog logic, analog computation or anything related to analog unless they are technology created with materials found directly in nature. For example, electronics are basically analog until we force them to be discrete for easy computation. Thus, we have sacrificed continuity for rigid, discrete representation that has taken away a lot of functionality.

Missing continuity in the algorithms

We find that whether we are a child, youth or old, learning is something that does not seem to stop. Even if a person has alzheimer’s, we find that some adaptation and learning is always present. We can have forgotten some of the information learnt, but learning new things, however insignificant it seems, does not stop. As I have explained before, in our AI systems, given that we work with offline learning, once an algorithm has been formed with training data, the algorithm does not adapt itself for the data, to apply the algorithm learnt. More over given that our data representation is discrete, the logic is also discrete. Multiple discrete business logic tied together to create the illusion of continuity.

Let us take an example of an algorithm that we build, implementing a finite state machine. This is a very common problem that is implemented in every business application to track the state of the business object such as order, shipment, Part and so on. Sometimes we need to use multiple statuses against the same business object to track various functions.

For example: we have all seen an amazon order go from status of “order placed”, “dispatched”, “shipped” and “delivered”. At the “shipped” status we can further track the shipment that changes the status from “carrier picked up at facility”, “arrived at facility”, “out for delivery” and so on. These are very simple finite state machines. Yet, it should be noted that the state of the “order” is contained with the information in the data associated with the order and other related business objects. It cannot be a secondary field stored on each business object that needs it to be explicitly updated. In the example of the order, the fact that the order is dispatched is truly a set of actions based on the warehouse operation. In the simplest, it consists of actions: “planning the shipment in which to ship”, “picking the item from the warehouse shelf”, “packing the item”, “placing it in the bin that is allocated to the shipment”, “loading the truck by the carrier based on manifest”, “check the loaded shipments” and “truck leaving the facility”. These are all coded as discrete logic and tied together by id relations that ties the data together.

Looking at this from a different perspective, if we were able to continuously track the movement of the “item” as it moved across the warehouse floor, we would have got the status without having a discrete field that represents the status of the order, which tends to introduce bugs into the system. The problem we all face when we write a software application is that “the data that is stored in the computers are devoid of the actual reality”.

We are currently trying to fix the problem by introducing IoT sensors. But again these sensors are also discrete. We can introduce an RFID tag on each item and add RF readers across the warehouse to track where an item is present in the warehouse. But, we have still not got away from the fact that we have data that is discrete disparate data that we need to tie together. So relating the order and the item is a logic based process rather than a data-based process. Thus, all our logic is also very discrete in nature. We write code that tracks an item hoping to the good Lord that the person whoever is supposed to use the system to enter the details does a good job without entering it with errors. We have introduced many easy to use mechanisms such as barcode scanners and RF scanners that needs to be used, but yet errors creep in. We write code to take this item scan and update the shipment in which it is supposed to go, then we pick all of these and try to update the order with the status and so on we string together a list of business functions that are discrete in nature and again hope to that good Lord that the programmer has done a good job of handling all error conditions. No wonder, that this is the kind of code we are trying to get our AI to learn, obviously the AI also spits out discrete pieces that are strung together as a if-else clause.

The other version of the problem of not having continuity can be seen when we look at the example of autonomous car driving that I described in the previous section. As I have said, our processors are only capable of discrete processing. While we talk about it as a streaming, data is not real-time streamed, it is sampled at a very fast speed to simulate real-time streaming. So, if we added a sensor to know the current scene around the car, then we can only sample this data every 1s or every 0.5s or even every 1ms and process the picture captured by the camera. So, in a situation where cars are travelling at very high speeds the response time required to avert a collison where a sudden change occurs is very difficult, plainly because the data has not been sampled that fast. To do this, we need what we create in computer applications as event based processing. We need to be able to detect disruption in continuity rather than regularly monitor the scene by sampling. But, what we miss is continuity in our algorithms because of the discreteness of our data representation.

Accurate computations as a necessity in our algorithms

Taking the car driving as an example again, we find that when we drive we do not do any accurate speed, accurate angle for turning steering wheels, accurate computations of pressing accelerator etc computations. We only know we want to increase speed or decrease speed and we appropriately press or release the accelerator. Same goes for the brakes. We do not compute the accurate distance the car will travel when we press the brake. We start braking gradually and based on the decrease in speed of the car keep pressing the brake continuously till it comes to a complete stop within the distance we want it to stop. Very rarely we know the mathematically accurate distance it has taken to stop the car. There are definitely those situations where we hit the brakes at full hoping and praying to the good Lord that the damned car will stop in time, but again we have no clue of the mathematical accurate distances, deceleration or any such numbers. Even here, we just know the rate at which the car is decreasing approximately and we may not be able to stop in time. Yet we can drive the car much much better than any computer program. Why is this so?

In reality, we rarely find the requirement for mathematical accuracy. Say we are drinking coffee, we just lift the cup and bring it to our mouth and tip it correctly for the correct flow of the fluid to be able to swallow it. In fact we can very easily control the flow rate of water or any fluid to be able to correctly keep swallowing in real-time as the fluid flows without actually know the accurate rate of flow of the fluid. The match of the rate of flow of fluid from the cup to the mouth to the throat is something we can do very easily. How is this, when in a computer even though we can easily compute accurately and match up speeds, we cannot use it to correctly do many simple tasks.

The point is that accuracy is not a necessity for knowledge and intelligence. In fact knowledge and intelligence is more driven by fuzziness and inaccuracy rather than accuracy. Learning algorithms does not need mathematical accuracy. That is only required for computations. What is required is patterns and pattern related algorithms and operations which we cannot easily do with computers. Pattern operations introduces as a part and parcel of it a randomness and fuzziness which cannot be got when we have an accurate computation machine. Thus, we find, when we need to have alerting programs that needs to alert when the temperature has gone beyond 35o, we need the accurate value of 35 before an alert is raised. 34.999999 o will also not raise an alert. We do try to mitigate this by adding Gaussian curves and other such solutions, but this is just band-aid. This is where inherent approximate comparisons of patterns and pattern logic makes a lot of difference.

Why does this make a difference in AI? When we look at the car driving example, say we computed the trajectories of two cars and determined that the collision is happening at a distance of 500m from the current point, should we change the position of the car or should we not change the position of the car? At what cut off distance should we start changing the position of the car for it to correctly respond? Should we spend precious compute cycles to compute the exact mathematical distances at which the trajectories will meet? Does it even make sense to compute it at that accuracy when we do not know with exact accuracy the length and width of my car or the second car? But, then say we did do an approximate computation, where does the approximation end? How much do we approximate?

What do we need to create an AI that does not seem fake?

What need to understand in all this is that we are trying to mimic what appears to us as the learning algorithm in order to mimic the intelligence of the universe and pass it off as AI and hence everything seems fake. To truly create an intelligence, we need to understand the internals of the working of the universe and find a way to implement that and allow the other related parameters to develop on its own, to get a truly intelligent experience.

We need a redefinition of what artificial intelligence is. A decision making system is not an intelligent system. A if-else rule based system is not an intelligent system. A system that can learn a logic for a given data and apply it to other similar data set is not an intelligent system. As I have said in the first chapter, all of these are just appearances or a result of the true underlying knowledge that is needed to create an intelligent system.

We need to create a system that can truly start with nothing but some seed information that dictates its own characteristic. Be able to put the system into a situation, gather knowledge about the situation, apply its characteristic on that situation to come up with a solution to handle the situation (problem or no problem does not matter), implement the solution, find the reaction, record it and keep collecting such reactions and as it keeps moving forward be increasingly efficient in the reaction to a situation. Subsequently, it needs to be able to pickup the correct relevant knowledge for a given situation and adapt the learnt solution for the varied parameters of the current situation. In pursuit of this I started looking at the Sanskrit literature to find how does our reality work? This is what I have described further in the book. The algorithm that has been described in these Sanskrit literature.

Buy book at

Read More in the book

The algorithm of the Universe (A new perspective to cognitive AI)

There are many attempts to explain the existence of the universe. Even the observable universe around us is filled with unknowns and numerous unexplainable phenomena, let alone the unobservable universe. Each and every theory proposed is just trying to find an explanation for the existence of the observable universe around us, ignoring the fact that… Read More

%d bloggers like this: