LLM | Global Institute of Artificial Intelligence

ChatGPT to replace not (intelligent) jobs but (boring) tasks

Member for

6 months 3 weeks

Real name

Ethan McGowan

Position

Professor

Bio

Founding member of GIAI & SIAI
Professor of Data Science @ SIAI

입력

2024-07-17 07:31

수정

2024-11-26 17:03

ChatGPT is to replace not jobs but tedious tasks
For newspapers, 'rewrite man' will soon be gone
For other jobs, the 'boring' parts will be replaced by AI, 
but not the intellectual and challenging parts

There has been over a year of hype for Large Language Models(LLMs). At the onset and initial round of hype, people outside of this field asked me if their jobs were to be replaced by robots. By now, over a year of trials with ChatGPT, they finally seem to understand that it is nothing more than an advanced chatbot that still is unable to stop generating 'bullshit', according to Noam Chomsky, an American professor and public intellectual known for his work in linguistics and social criticism.

As my team at GIAI predicted in early 2023, all LLM trials will be able to replace some jobs, but most jobs that will be replaced will be simple mundane tasks. That's because these language models are meant to find higher correlation between text/image groups, but still unable to 'intelligently' find logical connection between thoughts. In statistics, it is called high correlation with no causality, or simply 'spurious relations'.

LLMs will replace 'copying boys/girls'

When we were first approached by EduTimes back in early 2022, they thought we could create an AI machine to replace writers and reporters. We told them the best we can create is to replace a few boring desk jobs like 'rewrite man'. The job that requires to rewrite what other newspapers have already reported. 'Copy boy' is one well-known disparaging term for that job. Most large national magazines have such employees, just to keep their magazines to be up-to-dated with recent news.

Since none of us at GIAI are from journalism, and EduTimes is far from a large national magazine, we are not aware of exact proportion of 'rewrite man' in large magazines, let alone how many articles are re-written by them. But based on what we see from magazines, we can safely argue that at least 60~80% articles are probably written by the 'copy boys/girls'. Some of them are at the high risk of plagiarism. This is one sad reality of journalism industry, accoring to the EduTimes team.

The LLM that we are working on, GLM(GIAI's Language Model), isn't that different from other competitors in the market that we also have to rely on text bodies' correlations, or more precisely 'associations' by the association rules in machine learning textbooks. Likewise, we also have lots of inconsistency problems. To avoid the Noam Chomsky's famous accusation, 'LLMs are bullshit generators', the best any data scientist can do is just to set a high cut-off in support, confidence, and lift. Beyond that, it is not the job of data models, which includes all AI variants for pattern recognition.

But still correlation does not necessarily mean causality

The reason we see infinitely many 'bullshit' cases is because the LLM services still belong to statistics, a discipline to find not causality but correlation.

If high correlation can be translated to high causality, there has to be one important condition satisfied. The data set contains all coherent information so that high correlation naturally means high causality. This actually is where we need the EduTimes. We need clean, high quality, and topic-specific data.

After all, this is why OpenAI is willing to pay for data from Reddit.com, a community with intense and quality discussions. LLM service providers are in negotiation with U.S. top newspapers precisely the same reason. Although it does not mean that coherent and quality news articles will give us 100% guarantee in correlation to causality, at least we can establish a claim that disturbing cases will largely be gone without time-consuming technical optimization.

By the same logic, jobs that can be replaced by LLMs or any other AIs with pattern matching algorithms are the ones that have strong and repeating patterns that does not require logical connections.

AI can replace not (intelligent) jobs but (boring) tasks

As we often joke around at GIAI, technologies are bounded by mathematical limitations. Unfortunately, we are not John von Neumann who can solve every impossible mathematical challenges as easy as college problem sets. Thanks to computational breakthroughs, we are already at the level far from what we expected 10 years ago. Back then, we did not expect to extract corpora from 10 books in a few minites. If anything, we thought it needed weeks of supercomputer resources. It is not anymore. But even with surprising speed of computational achievements, we are still bound to mathematical limits. As said, correlation without causality is 'bullshit'.

With the current mathematical limitations, we can say

AI can replace not (intelligent) jobs but (super mega ultra boring) tasks

And, the replaceable tasks are boring, tedious, repetitive, and patterned tasks. So, please stop worrying about losing jobs, if yours torture your brain to think. Instead, plz think about how to use LLMs like automation to lighten your burden from mundane tasks. It will be like your mom's laundary machine and dish washer. Younger generation females no longer are bound to housekeeping. They go out to work places and fight for the positions that meet their dreams, desires, and wants.

Research Category

LLM

Member for

6 months 3 weeks

Real name

Ethan McGowan

Position

Professor

Bio

Founding member of GIAI & SIAI
Professor of Data Science @ SIAI

Post hoc, ergo propter hoc - impossible challenges in finding causality in data science

Member for

7 months

Real name

David O'Neill

Position

Professor

Bio

Founding member of GIAI & SIAI
Professor of Data Science @ SIAI

입력

2024-06-14 00:00

수정

2024-11-26 16:59

Data Science can find correlation but not causality
In stat, no causal but high correlation is called 'Spurious regression'
Hallucinations in LLMs are repsentative examples of spurious correlation

Imagine two twin kids living in the neighborhood. One prefers to play outside day and night, while the other mostly sticks to his video games. After a year later, doctors find that the gamer boy is much healthier, thus conclude that playing outside is bad for growing children's health.

What do you think of the conclusion? Do you agree with the conclusion?

Even without much scientific training, we can almost immediately dismiss the conclusion that is based on lop-sided logic and possibly driven by insufficient information of the neighborhood. For example, if the neighborhood is as radioactively contained as Chernobyl or Fukushima, playing outside can undoubtedly be as close as committing a suicide. What about more nutrition provided to the gamer boy due to easier access to home food? The gamer body just had to drop the game console for 5 seconds to eat something, but his twin had to walk or run for 5 minites to come back home for food.

In fact, there are infinitely many potential variables that may have affected two twin kids' condition. Just by the collected data set above, the best we can tell is that for an unknown reason, the gamer boy is medically healthier than the other twin.

In more scientific terms, it can be said that statistics has been known for correlations but not for causality. Even in a controlled environment, it is hard to argue that the control variable was the cause of the effect. Researchers only 'guess' that the correlation means causality.

Post Hoc, Ergo Propter Hoc

There is a famous Latin phrase meaning "after this, therefore on account of it". In plain English, it means that one event is the cause of the other event occuring right next. You do not need rocket science to counterargue that two random events are interconnected just because one occured right after another. This is a widely common logical mistake that assigns causality just by an order of events.

In statistics, it is often called that 'Correlation does not necessarily guarantee causality'. In the same context, such a regression is called 'Spurious regression', which has been widely reported in engineers' adaptation of data science.

One noticeable example is 'Hallucination' cases in ChatGPT. The LLM only finds higher correlation between two words, two sentences, and two body of texts (or images in these days), but it fails to discern the causal relation embedded in the two data sets.

Statistians have long been working on to differentiate the causal cases from high correlation, but the best so far we have is 'Granger causallity', which only helps us to find no causality case between 3 variables. Granger causality offers a philophical frame that can help us to test if the 3rd variable can be a potential cause of the hidden causality. The academic countribution by Professor Granger's research to be Nobel Prize awarded is because it proved that it is mechanically (or philosophically) impossible to verify a causal relationship just by correlation.

Why AI ultimately needs human approval?

The Post Hoc Fallacy, by nature of current AI models, is an unavoidable huddle that all data scientists have to suffer from. Unlike simple regression based researches, the LLMs rely on too large chunk of data that it is practically impossible to tackle every connection of two text bodies.

This is where human approval is required, unless the data scientists decide to finetune the LLM in a way to offer only the highest probable (thus causal) matches. The more likely the matches are, the less likely there will be spurious connection between two sets of information, assuming that the underlying data is sourced from accurate providers.

Teaching AI/Data science, I surprisingly often come across a number of 'fake experts' whose only understanding of AI is a bunch of terminology from newspapers, or a few lines of media articles at best, without any in-depth training in basic academic tools, math and stat. When I raise Grange causality as my counterargument for impossibility to distinguish from correlation to causality by statistical methods alone (by far philosophically impossible), many of them ask, "Then, wouldn't it be possible with AI?"

If the 'fake experts' had some elementary level math and stat training from undergrad, I believe they should be able to understand that computational science (academic name of AI) is just a computer version of statistics. AI is actually nothing more than the task of performing statistics more quickly and effectively using computer calculations. In other words, AI is a sub-field of statistics. Their questions can be framed like

If it is impossible with statistics, wouldn’t it be possible with statistics calculated by computers?
If it is impossible with elementary arithmetic, wouldn't it be possible with addition and subtraction?

The inability of statistics to make causal inferences is the same as saying that it is impossible to mechanically eliminate hallucinations in ChatGPT. Those with academic training in the fields social sciences, the disciplines of which collect potentially correlated variables and use human experience as the final step to conclude causal relationships, see that ChatGPT is built to mimic cognitive behavior at the shamefully shallow level. The fact that ChatGPT depends on 'Human Feedback' in its custom version of 'Reinforcement Learning' is the very example of the basic cognitive behavior. The reason that we still cannot call it 'AI' is because there is no automatic rule for the cheap copy to remove the Post Hoc Fallacy, just like Clive Granger proved in his work for Nobel Prize.

Causal inference is not monotonically increasing challenge, but multi-dimensional problem

In natural science and engineering, where all conditions are limited and controlled in the lab (or by a machine), I often see cases where they see human correction as unscientific. Is human intervention really unscientific? Well, Heidelberg's indeterminacy principle states that when a human applies a stimulus to observe a microscopic phenomenon, the position and state just before applying the stimulus can be known, but the position after the stimulus can only be guessed. If no stimulation is applied at all, the current location and condition cannot be fully identified. In the end, human intervention is needed to earn at least partial information. Withou it, one can never have any scientifically proven information.

Computational science is not much different. In order to rule out hallucinations, researchers either have to change data sets or re-parameter the model. The new model may be closer to perfection for that particular purpose, but the modification may surface hidden or unknown problems. The vector space spanned by the body of data set is too large and too multidimensional that there is no guarantee that one modification will always monotonically increase the perfection in every angle.

What is more concerning is that the data set is clean, unless you are dealing with low noise (or zero noise) ones like grammatically correct texts and quality images. Once researchers step aside from natural language and image recognition, data sets are exposed to infinitely many sources of unknown noises. Such high noise data often have measurement error problems. Sometimes researchers are unable to collect important variables. These are called 'endongeneity', and social scientists have spent nearly a century to extract at least partial information from the faulty data.

Social scientists have modified statistics in their own way that complements 'endogeneity'. Econometrics is a representative example, using the concept of instrumental variables to eliminate problems such as errors in variable measurement, omission of measured variables, and two-way influence between explanatory variables and dependent variables. These studies are coined 'Average Treatment Effect' and 'Local Average Treatment Effect' that were awarded the Nobel Prize in 2021. It's not completely correct, but it's part of the challenge to find a little less wrong.

Some untrained engineers claim magic with AI

Here at GIAI, many of us share our frustrations with untrained engineers confused AI as a marketing term for limited automatization with real self-evolving 'intelligence'. The silly claiming that one can find causality from correlation is not that different. The fact that they claim such spoofing arguments already proves that they are unaware of Granger's causality or any philosophically robust proposition to connect/disconnect causality and correlation, thus proves that they lack scientific training to handle statistical tools. Given that current version of AI is no better than pattern matching for higher frequency, it is no doubt that scientifically untrained data scientists are not entitled to be called data scientists.

Let me share one bizarre case that I heard from a colleague here at GIAI from his country. In case anyone feel that the following example is a little insulting, a half of his jokes are about his country's inable data scientists. In one of the tech companies in his country, a data scientist was given to differentiate a handful of causal events from a bunch of correlation cases. The guy said "I asked ChatGPT, but it seems there were limitations because my GPT version is 3.5. I should be able to get a better answer if I use 4.0."

The guy not only is unaware of the post hoc fallacy in data science, but he also highly likely does not even understand that ChatGPT is no more than a correlation machine for texts and images by given prompts. This is not something you can learn from job. This is something you should learn from school, which is precisely why many Asian engineers are driven to the misconception that AI is magic. It has been known that Asian engineering programs generally focus less on mathematical backup, unlike renowned western universities.

In fact, it is not his country alone. The crowding out effect is heavy as you go to more engineer driven conferences and less sophisticated countries / companies. Despite the shocking inability, given the market hype for Generative AI, I guess those guys are paid high. Whenever I come across mockeries like the untrained engineers and buffoonish conferences, I just laugh it off and shake it off. But, when it comes to businesses, I cannot help ask myself if they worth the money.

Research Category

LLM

Member for

7 months

Real name

David O'Neill

Position

Professor

Bio

Founding member of GIAI & SIAI
Professor of Data Science @ SIAI

Don't be (extra) afraid of math. It is just a language

Member for

8 months

Real name

Keith Lee

Position

Professor

입력

2024-04-23 07:17

수정

2024-11-26 16:52

Math in AI/Data Science is not really math, but a shortened version of English paragraph.
In science, researchers often ask 'plz speak in plain English', a presentation that math is just to explain science in more scientific way.

I liked math until high school, but it became an abomination during my college days. I had no choice but to make records of math courses on my transcript as it was one of the key factors for PhD admission, but even after years of graduate study and research, I still don't think I like math. I liked it when it was solving a riddle.

The questions in high school textbooks and exams are mostly about finding out who did what. But the very first math course in college forces you to prove a theorem, like 0+0=0. Wait, 0+0=0? Isn't it obvious? Why do you need a proof for this? I just didn't eat any apple, so did my sister. So, nobody ate any apple. Why do you need lines of mathematical proof for this simple concept?

Then, while teaching AI/Data Science, I often claim that math equations in the textbook are just short version of long but plain English. I tell them "Don't be afraid of math. It is just a language." Students are usually puzzled, and given a bunch of 0+0=0 like proof in the basic math textbooks for first year college courses, I get to grasp why my students showed no consent to the statement (initially). So, let me illustrate my detailed back-up.

Math is just a language, but only in a certain context

Before I begin arguing math is a language, I would like to make a clear statement that math is not really a language as in academic defintion of language. The structure of math theorem and corollary, for example, is not a replacement of paragraph with a leading statement and supporting examples. There might be some similarity, given that both are used to build logical thinking, but again, I am not comparing math and language in 1-to-1 sense.

I still claim that math is a language, but in a certain context. My topic of study, along with many other closely related disciplines, usually create notes and papers with math jargons. Mathematicians maybe baffled by me claiming that data science relies on math jargons, but almost all STEM majors have stacks of textbooks mostly covered with math equations. The difference between math and non-math STEM majors is that the math equations in non-math textbooks have different meaning. For data science, if you find y=f(a,b,c), it means a, b, and c are the explanatory variables to y by a non-linear regressional form of f. In math, I guess you just read it "y is a function of a, b, and c."

My data science lecture notes usually are 10-15 pages for a 3-hour-long class. It might look too short for many of you, but in fact I need more time to cover the 15-pager notes. Why? For each page, I condense many key concepts in a few math equations. Just like above statement "a, b, and c are the explanatory variables to y by a non-linear regressional form of f", I read the equations in 'plain English'. In addition to that, I give lots of real life examples of the equation so that students can fully understand what it really means. Small variations of the equations also need hours to explain.

Let me bring up one example. Adam, Bailey, and Charlie have worked together to do a group assignment, but it is unsure if they split the job equally. Say, you know exactly how the work was divided. How can you shorten the long paragraph?

y=f(a,b,c) has all that is needed. Depending on how they divided the work, the function f is determined. If y is not a 0~100 scale grade but a 0/1 grade, then the function f has to reflect the transformation. In machine learning (or any similar computational statistics), we require logistic/probit regressions.

In their assignment, I usually skip math equation and give a long story about Adam, Bailey, and Charlie. As an example, Charlie said he's going to put together Adam's and Bailey's research at night, because he's got a date with his girlfriend in the afternoon. At 11pm, while Charlie was combining Adam's and Bailey's works, he found that Bailey almost did nothing. He had to do it by himself until 3am, and re-structured everything until 6am. We all know that Charlie did a lot more work than Bailey. Then, let's build it in a formal fashion, like we scientists do. How much weight would you give it to b and c, compared to a? How would you change the functional form, if Dana, Charlie's girlfriend, helped his assignment at night? What if she takes the same class by another teacher and she has already done the same assignment with her classmates?

If one knows all possibilities, y=f(a,b,c) is a simple and short replacement of above 4 paragraphes, or even more variations to come. This is why I call math is just a language. I am just a lazy guy looking for the most efficient way of delivering my message, so I strictly prefer to type y=f(a,b,c) instead of 4 paragraphes.

Math is a univeral language, again only in a certain context

Teaching data science is fun, because it is like my high school math. Instead of constructing boring proof for seemingly an obvious theorem, I try to see hidden structures of data set and re-design model according to the given problem. The diversion from real math is due to the fact that I use math as a tool, not as a mean. For mathematicians, my way of using math might be an insult, but I often say to my students that we do not major math but data science.

Let's think about medieval European countries when French, German, and Italian were first formed by the process of pidgin and creole. In case you are not familiar with two words, pidgin language is to refer a language spoken by a children by parents without common tongue. Creole language is to refer a common language shared by those children. When parents do not share common tongue, children often learn only part of the two languages and the family creates some sort of a new language for internal communication. This is called pidgin process. If it is shared by a town or a group of towns, and become another language with its own grammar, then it is called creole process.

For data scientists, mathematics is not Latin, but French, German, or Italian, at best. The form is math (like Latin alphabet), but the way we use it is quite different from mathematicians. For major European languages, for some parts, they are almost identical. For data science, computer science, natural science, and even economics, some math forms mean exactly the same. But the way scientists use the math equations in their context is often different from others, just like French is a significant diversion from German (or vice versa).

Well-educated intellectuals in medieval Europe should be able to understand Latin, which must have helped him/her to travel across western Europe without much trouble in communication. At least basic communication would have been possible. STEM students with heavy graduate course training should be able to understand math jargons, which help them to understand other majors' research, at least partially.

Latin was a universal language in medieval Europe, so as math to many science disciplines.

Math in AI/Data Science is just another language spoken only by data scientists

Having said all that, I hope you can now understand that my math is different from mathematician's math. Their math is like Latin spoken by ancient Rome. My math is simply Latin alphabet to write French, German, Italian, and/or English. I just borrowed the alphabet system for my own study.

When we have trouble understanding presentations with heavy math, we often ask the presentor, "Hey, can you please lay it out in plain English?"

The concepts in AI/Data Science can be, and should be able to be, written in plain English. But then 4 paragraphes may not be enough to replace y=f(a,b,c). If you need way more than 4 paragraphes, then what's the more efficient way to deliver your message? This is where you need to create your own language, like creole process. The same process occurs to many other STEM majors. For one, even economics had decades of battle between sociology-based and math-based research methods. In 1980s, sociology line lost the battle, because it was not sharp enough to build the scientific logic. In other words, math jargons were a superior means of communication to 4 paragraphes of plain English in scientific studies of economics. Now one can find sociology style economics only in a few British universities. In other schools, those researchers can find teaching positions in history or sociology major. And, mainstream economists do not see them economists.

The field of AI/Data Science evolves in a similar fashion. For once, people thought software engineers are data scientists in that both jobs require computer programming. I guess now in these days nobody would argue like that. Software engineers are just engineers with programming skills for websites, databases, and hardware monitoring systems. Data Scientists do create computer programs, but it is not about websites or databases. It is about finding hidden patterns in data, building a mathematically robust model with explanatory variables, and predicting user behaviors by model-based pattern analysis.

What's still funny is that when I speak to another data scientists, I expect them to understand y=f(a,b,c), like "Hey, y is a function of a, b, and c". I don't want to lay it out with 4 paragraphes. It's not me alone that many data scientists are just as lazy as I am, and we want our counterparties to understand the shorter version. It may sound snobbish that we build a wall against non-math speakers (depsite the fact that we also are not math majors), but I think this is an evident example that data scientists use math as a form of (creole) language. We just want the same language to be spoken among us, just like Japanese speaking tourists looking for Japanese speaking guide. English speaking guides have little to no value to them.

Math in AI/Data Science can be, should be, and must be translated to 'plain English'

A few years ago, I have created an MBA program for AI/Data Science that shares the same math-based courses with senior year BSc AI/Data Science, but does not require hard math/stat knoweldge. I only ask them to borrow the concept from math heavy lecture notes and apply it to real life examples. It is because I wholeheartedly believe that the simple equation still can be translated to 4 paragraphes. Given that we still have to speak to each other in our own tongue, it should be and must be translated to plain language, if to be used in real life.

As an example, in the course, I teach cases of endogeneity, including measurement error, omitted variable bias, and simultaneity. For BSc students, I make them to derive mathematical forms of bias, but for MBA students, I only ask them to follow the logic that what bias is expected for each endogenous case, and what are closely related life examples in business.

An MBA student tries to explain his company's manufacture line's random error that slows down automated process by measurement error. The error results in attenuation bias that under-estimates mismeasured variable's impact in scale. Had the product line manager knew the link between measurement error and attenuation bias, the loss of automation due to that error must have attracted a lot more attention.

Like an above example, some MBA students in fact show way better performance than students in MSc in AI/Data Science, more heavily mathematical track. They think math track is superior, although many of them cannot match math forms to actual AI/Data Science concepts. They fail not because they do not have pre-training in math, but because they just cannot read f(a,b,c) as work allocation model by Adam, Bailey, and Charlie. They are simply too distracted to math forms.

During admission, there are a bunch of stubborn students with a die-hard claim that MSc or death, and absolutely no MBA. They see MBA a sort of blasphamy. But within a few weeks of study, they begin to understand that hard math is not needed unless they want to write cutting edge scientific dissertations. Most students are looking for industry jobs, and the MBA with lots of data scientific intuition was way more than enough.

The teaching medium, again, is 'plain English'.

With the help of AI translator algorithms, I now can say that the teaching medium is 'plain language'.

Research Category

LLM