
AI Thievery and the End of Humanity by Ing
AI technology came seemingly out of nowhere just a hot minute ago, and suddenly it’s everywhere—and it seems like it’s either the end of the world or a dawning utopia depending on who you listen to.
A family member sent me this article the other day, and it got me thinking about how people on both sides of the AI debate are getting it wrong. And also at least a little bit right. Perspective: My books were used without permission to train AI models. What now? [URL: https://www.deseret.com/opinion/2025/05/03/ai-engines-used-my-books-without-permission/%5D
As the title suggests, the author is not a fan. And for good reason.
It’s a simple fact that that at least some of the organizations that are making and training AI models didn’t ask permission to use any of the material they trained their large language models on. Most of it, they arguably didn’t have to. It was on the internet because it was intended for public consumption.
However, some portion of it WASN’T intended for free public consumption. People pirated books written by the author of the article above (along with many others), and the LLMs scraped the contents of entire pirated archives along with everything else. Which means they illegally used vast amounts of material. But because digital materials and copyright are somewhat abstract and the brobdingagian scale of the theft boggles the mind, very few people can be made to care.
What should be done?
The organizations that trained AI models on that material need to make it right by paying royalties and/or licensing fees for what they illegally copied. And if they won’t voluntarily make it right by paying royalties and licensing fees for what they illegally copied, then they should pay steep penalties on top of it.
How to do it, I don’t know. But the burden should be on them, not their victims. And it should hurt. If it puts some of the companies developing AI or even all of them out of business, well, tough titties. I don’t think it will, but if it does, fine. They should’ve thought about the consequences before they stole all that stuff.
But the people who have been injured here, and AI opponents in general, aren’t helping themselves by framing its emergence as some kind of technopocalypse. They’re throwing around phrases like ”making people stupid and making truth irrelevant” and talking about limits and guardrails, but the horse isn’t even on the track. The horse has left the barn, the chickens have flown the coop, and neither protest nor wishful thinking will reverse the event. AI is not going away.
You could bankrupt every company that’s currently working on it, and somebody else would still pick it up and start it up again, because it’s VALUABLE. People can make money off it. Big businesses can make big money. Smaller businesses can make smaller money. All sorts of individuals and organizations all over the place can can make their lives easier and do more work with less time and effort. Or, yes, “create” some cool thing despite having zero talent and no desire to develop the craft (see: me, visual art).
It’s unfortunate that the writer of the article I linked above highlights a legitimate problem—thievery on a grand scale—and then descends into hysteria. (Calling it AI doesn’t help. Artificial it is, intelligent it ain’t. Large language models—LLMs—are just a mass-marketable form of machine learning, which is a concept that has been in use for a long time.) This new technology isn’t going to destroy culture itself or the human mind any more than the printing press, computers, the internet, or the Industrial Revolution could.
But those things did irreversibly alter our cultures and way we interact with the world. And this technology is developing with far greater speed.
It reminds me of something broadcast news legend Edward R. Murrow said back in the 1960s, when people were starting to wonder what the recent advent of computers would do to society: “The newest computer can merely compound, at speed, the oldest problem in the relations between human beings, and in the end the communicator will be confronted with the old problem, of what to say and how to say it.”
LLMs (I hate calling it AI) are a heck of a technology, in that they actually CAN, to some extent, remove the old problem what to say and how to say it…but unless they evolve into something else entirely, they can’t absolve you of the need to consider whether you SHOULD say it. The “oldest problem in the relations between human beings” will remain, and that’s the gap of understanding.
The problems inherent in figuring out what you believe and who and what you can trust are about to be compounded at speed. Again.
So, as our poor wronged author asks above, what now?
Well, we do need to impose consequences for the thievery that occurred during LLM development. That’s a Herculean task by itself, and I have no ideas on how it ought to be accomplished, but I don’t think fearmongering is going to make it any easier.
And if you’re worried about where this technology is going to take us, so am I.
We should probably all be at least a little bit worried about that. If you’re not uneasy about some of the uses people will try to put it to—especially the ones in various governments, with their talk of “guardrails” and “misinformation”—you’re not paying attention. (If you haven’t yet, check out what Marc Andreesen told Ross Douthat after the 2024 election, especially the part about what the Biden junta wanted to do with the AI industry: https://www.nytimes.com/2025/01/17/opinion/marc-andreessen-trump-silicon-valley.html.)
As for the worry about AI taking away people’s jobs, it probably will take some jobs from some people. New technologies tend to do that. The company I work for uses machines to do all the soldering humans once did, almost all of the placement of parts on printed circuit boards, and almost all of the previously tedious, painstaking, and error-prone inspection processes. Yet humans still have jobs in those factories—jobs that are a lot more humane than factory work used to be. New technologies tend to do that, too. (I once was a human doing soldering by hand on a production line, and man, am I ever glad we’ve got robots to do it now.)
I used to scoff at the possibility of AI taking my job (I played with some early versions of ChatGPT in 2020, and they were pathetic), but now it looks like there might be a real chance that it could. If not the entire job, at least the part of it I enjoy the most, which is putting words together to convey valuable information to other people.
It can’t yet. But a lot of people want it to. Earlier this year, I got a freelance writing gig from a university I used to work for because they tried to have ChatGPT do a particular writing job last year—and they got slop. Grammatically correct, but slop. They used it, but they also realized the product needed human discernment and creativity, so this year they ponied up the money to pay a human. But what if next year’s budget doesn’t have the money? What if they try AI again and the result is actually good enough?
My day job may be resistant to takeover, as it requires both specific technical knowledge (not very much, but some) and creativity (arguably not a lot, but some). But it’s not out of the question that as AI models improve, they might be capable of synthesizing technical information, a campaign guide, and the appropriate vocabulary well enough to get an effective marketing message across. And there’s 90% of my job, lost to automation. I wouldn’t like it, but I really couldn’t blame anyone for making that decision.
So, again…what now?
For now, I’m approaching this the way I approach most innovations: cautiously. I have less than zero trust in Big Tech (they’ve earned it), so I’m shunning all the AI “enhancements” to my consumer products, which have always worked perfectly well without it. (Are there any that work better with it? I seriously doubt it.) And I’m slowly and somewhat reluctantly experimenting with ways LLMs can save me time or make certain things easier—mostly at work, where there are IT and security professionals who have approved certain apps and where I stand to gain or lose the most monetarily by knowing (or not knowing) what the technology can do. And where, if a certain Big Tech behemoth screws anybody over for its own profit, it’s not me personally.
I might end up crafting prompts and editing a machine’s output for a living, which would make me feel sad, but I guess I could deal. I was a kick-ass copy editor in my day and I still have some skills that a computer can’t develop. Yet.
In conclusion, while the advent of AI and LLMs presents significant ethical and practical challenges, it is crucial to approach these developments with a balanced perspective. The unauthorized use of copyrighted materials for training AI models is a serious issue that demands accountability and restitution. However, framing AI as an existential threat to humanity may hinder constructive dialogue and solutions. Instead, we should focus on implementing robust regulations and ethical standards to ensure fair use and protect intellectual property. As AI continues to evolve, it is essential to remain vigilant and adaptive, leveraging its benefits while mitigating its risks. Ultimately, the future of AI will depend on our collective ability to navigate its complexities with wisdom and foresight.
(Final note and disclosure: I fed the rest of this essay to Copilot and told it to write a conclusion for me. Haha! Never let it be said that I can’t adapt! Also, “implementing robust regulations”? Pshaw! I hope you realized that wasn’t me talking. It just now occurred to me that I might’ve got better results by telling Copilot to emulate my writing style (could it? I dunno), but maybe its best that I didn’t. Don’t want to give it any ideas about usurping my role as the thinker in this relationship.)
*This is Sarah. Unfortunately my work too was pirated to train LLMs. Am I on the warpath about it? No. Am I pissed? Yes. Because the LLMs belonged to small, scrappy companies like Meta and such. So, why am I not on the warpath? Well, because humans get my books to train new writers all the time. So they owe me the price… of a copy. I’m pissed because it’s paltry of them to balk me of $4.99. But it’s not like it would change my life. Eh. – SAH.*
Mind you the LLM AI companies that pirated al this content have yet to get anywhere near making money. I discussed this here – https://ombreolivier.substack.com/p/ai-actively-incinerating-cash?r=7yrqz
The fact that they can’t make money despite stealing some of what they use as training material is not promising for the technology. And lets not forget that one of the major use cases appears to be students using it instead of studying whatever subject they are supposed to be learning. I don’t buy the “Sky is falling” hype about the impact of LLM AI but I don’t see it being particularly beneficial either
(Other AI/ML companies such as the ones making drone aids for Ukraine can make money, but they didn’t steal their tech – or at least of they did it was classic industrial espionage/reverse engineering rather than torrents of pirated books)
LikeLiked by 1 person
I was gonna say. This article implies that access to large language models is valuable enough that people will be willing to pay enough for them to make them self-supporting. There is certainly a popular belief that it is so, but there is very little data to support that conclusion.
To say I’m skeptical about the ability of AI to generate enough money to pay for itself would be fair. I have generated a little bit of art using midjourney and it worked okay, but I don’t need a lot of art so counting on people like me to pay enough to keep an AI art generation company afloat seems pretty chancy to me.
I keep hearing from programmers who use AI to do simple things, and I suppose that’s one use, but my understanding of the way the technology works is pattern matching similar to what I do when I’m working on something and I don’t have the time or the energy to understand it before working on it. That runs the risk of making the easy parts somewhat easier at the risk, already widely observed from human pattern-matchers, of making the hard parts much harder. It’s kind of like the AI equivalent of the old saying about how all four wheel drive does is get you stuck farther from help.
I have been asking ChatGPT about things I don’t fully understand with some apparent success. The one technical thing I’ve used it for took about as much effort with ChatGPT as it did without, with the recognition that the second time I did it, with ChatGPT, managed to get the task done while the first attempt was not successful.
Anyway, I’m not convinced that my job (I am an embedded systems/robotics programmer) is at risk from the LLM AI’s. Cheaper alternatives for what I do have been around for a while and I’m still getting paid. It’s up to me to try to ensure my work is worth what I’m paid for it.
LikeLike
Models are now being trained by distributed teams of hobbyists. Falling hardware costs for a given performance will make those more effective. The economic argument is a non-starter unless you assume a completely static economy and technological base, which strangely enough most people do despite infinite evidence to the contrary.
Even in a worst case scenario where there is a total collapse of the industry would result in millions of H100s and A100s being dumped on the used market. A few months later every Tom Dick & Harry is releasing their own custom optimized model on github.
Very true, and universally so. A lot of the people who are screaming the loudest about AI taking their jobs reject that premise.
LikeLike
I do not believe that there is any argument that you would consider persuasive because you have already made up your mind about the extremely large value of AI. I know this, so I don’t find your arguments persuasive, either.
We’ll just have to wait a few years and see what happens.
LikeLike
Speaking as someone who uses it to support my work, I am doing in hours what used to take weeks. I need to supervise the thing, same as I would check the work of any junior coder, but it takes most of the time consuming scutwork out of it.
And it is *very* good at spotting syntax errors and misplaced parentheses
LikeLiked by 1 person
Which is what a lot of people report.
What’s the value of doubling a 10,000x developer’s output?
LikeLike
This is the main thing that the company I work for is doing. They’re training LLM models (which ones, I don’t know) to streamline coding workflows and automate all sorts of tedious and otherwise error prone tasks so that they can get more done. And they’re still hiring engineers as fast as they can—the idea is that the AI technology makes all of them faster, so that instead of adding capacity in small, expensive increments, they can multiply it. I have no idea how much they’re paying or how the economics work out on the provider side or whether it’ll really have the multiplier effect they’re looking for…but we’ll find out before long, I guess.
LikeLike
I’ve written a guest post on this. Not sure when it’ll go up.
LikeLike
My current catchphrase is “Humans are here to plan, exercise judgement, and make decisions. Leave the scutwork to the computer”
LikeLiked by 2 people
I could see this.
Also need coders that understand *programming. Because they have to know what is being generated, why, and what the outcome is suppose to be. Otherwise what is produced compiles and works, but does it do what is intended?
LikeLike
I’m genuinely curious. What parts of programming do you consider “scutwork?” A single worked example, starting with a perceived need, and demonstrating how you make use of, well, whatever it is you’re using and how you use the results you get would go a long way toward helping me understand your perspective.
Here’s an example of the sort of thing I mean: I am trying to evaluate various securities based upon the principles in Benjamin Graham’s book The Intelligent Investor. To that end, I need to process various bits of financial data from publicly held companies. I used to have a program that did all I need, but it was lost, and I’ve been reluctant to go through the effort of rewriting it. So, to that end, I wrote a prompt “Using the polygon api, write a python program to determine if the assets of a specified company exceed twice the current liabilities for that company.”
I provided that precise prompt to ChatGPT which gave me something that didn’t work. I fed back the error message and got back a different program that didn’t work in a different way. After four iterations of this, I gave up. It was calling methods that weren’t there, passing invalid arguments, and attempting to access fields not in the result.
My imagination is such that I have difficulty seeing how things work from a written description. That’s what makes playing board games such an ordeal for me. I can’t just read the rules and figure out how to play the game. I’ve been reading about LLM’s for years and I still can’t see how you might make productive use of them. For the record I find “reviewing code” to be just about the most tedious and error-prone things that programmers do, and we review all code, no matter how senior the coder.
Wait! Are you trying to tell me you use an LLM to find syntax errors and mismatched delimiters? Assuming that’s the case, and umm, no offense intended, but why? My editor shows me mismatched delimiters (parens, square brackets, curly braces, all kinds of quotes, etc.) and compilers have done a bang-up job of complaining about syntax errors since before I took my first dollar to write a line of BASIC, lo those many years ago. There are all kinds of static analysis tools to do that for a wide variety of languages and I would expect any programmer to make use of them before adopting an AI based tool.
LikeLiked by 1 person
Mostly the endless combing of stackoverflow trying to find how to do the one stupid thing I’m trying to do, tracking misplaced commas, parentheses, etc.
Also, it is really handy when you give it a code snippet and say “integrate this into my current code”
LikeLike
The current paradigm of ever-larger models is insolvent. They cost more to make and deploy than they bring in in revenue, so companies are only going to keep making them as long as the hype continues. Maybe they figure out how to make them profitable (doubtful, given what we’ve seen so far), or maybe they reduce training costs (tricky, since it’s tempting to roll any savings into bigger models as long as the investors are pumping in money), but something’s got to give.
The actual technology isn’t going anywhere. You can run the smaller models on a laptop without a GPU, and while it takes more power, there are off-the-shelf tools for finetuning them for your specific purpose. They are useful enough for coding, research, and some writing applications that they’re going to stick around, even in the unlikely event that no new advancements are made.
LikeLike
I’m not sure it makes sense to talk about this as “theft.” I mean, I write books, and make money from them; I write them in English; and my sense of how to put words together in English has been shaped by the thousands of books I’ve read over the course of my life, by whatever sense of “this word should precede this word” is the residue of that reading (for example, my sense that it’s “dependent ON X” but “independent OF Y” comes from exposure, not from any sort of logical analysis). Do I owe some sort of payment to the estates of Rudyard Kipling, and Ayn Rand, and J.R.R. Tolkien, and Robert Heinlein, and to living authors such as Donald Kingsbury and—well—Sarah Hoyt? The difference with LLMs seems to be that their probabilities are derived from a sample many times larger than mine, which reduces the amount owed to any one author to an even tinier fraction. If anything is owed in the first place—which seems doubtful.
LikeLiked by 1 person
Kipling still applies.
LikeLiked by 1 person
One noteworthy case is that Kipling’s “Cleared” can be sung to the tune of “The Wearing of the Green.” I have to think that was intentional—and intentionally offensive to the Irish nationalists it attacked.
LikeLiked by 1 person
It is copyright software license violation and theft on a HUGE scale. One goal is to replace those pesky employees and creators that cost companies serious money. Laws? What laws?
The elite should be able to make even mo’ bank without all these lower class meat tubes in the way. Just have to have enough sub-humans to bred kids for sexual pleasure/sport for the bored upper classes and their favorite minions without spoiling Gaia.
LikeLike
Thing is, I’m not so sure that what the AI companies did, scraping up huge volumes of data to train their LLMs, is copyright violation. Copyright violation pretty much implies that something got copied. Where is it? Where’s the copy? It got reduced to a bunch of numbers between 0 and 1 and irrevocably mixed with all the rest of the data, and nobody can reproduce that copyrighted material from the LLM.
Should what they did be against the law? Yes, probably. The copyright laws will need to be rewritten to forbid doing that kind of thing without permission.
Was it against the law? Again, to assert copyright violation you’ll need to show that a copy was made and distributed, or even kept permanently in a retrievable form. (Because of the nature of how computers work, a copy is made of any text that you read online, and that by itself cannot be a violation or every single person with a Web browser would be violating copyright every single day). And there was no copy kept: it was boiled down into a soup of numbers.
LikeLike
The copyright violation is that they stole the books to train their models.
If they had simply purchased the books for training their models, there would be no copyright violation.
Even if the hair on fire crowd disagrees. (Because technology is magic, and they want to BURN THE WITCH!)
It in none of our interests that megacorps can flout the law. Or ignore the basic ethical principle “If you extract value from another’s work, you should pay them”.
LikeLiked by 1 person
How did they do anything other than check out books from a library and read them? Not their fault they have better memory.
LikeLike
Sort of what I was thinking. If it’s not a violation to borrow books from a library (or from a friend) and read them without paying a royalty (and it’s not), if the books were purchased, or even borrowed from a library, I see no violation. If they were pirated, as has been asserted here, that would be a violation regardless of what was done with them, so I don’t see the case for involving AI in the question at all. AFAIK (Disclaimer: I’m neither an author nor a copyright lawyer), once you own (or legally borrow) a book you can do anything you want with it, including resell it if you own it. Read it to an audience if you want; the only issue might be if you were paid for doing so (see above disclaimer).
If someone knows why this is incorrect I’d appreciate the info.
LikeLike
The cases I know that my stuff has been used, the books were pirated.
LikeLike
OK. But just to be clear, the piracy is the issue, not the use to which they were put.
LikeLike
Of course. hence not throwing a fit.
LikeLike
The question is whether LLMs generate a “derived work” when they produce output. I suspect the answer is yes, or at least, yes part of the time. People have been known to complain that LLMs were seen to produce output that cribbed substantial chunks of their work. If those complaints are accurate then that would definitely be a copyright violation.
LikeLike
I have tried the experiment of asking ChatGPT for a review of Mackey Chandler’s April. The text it gave me contained many familiar phrases—because it took them from my two online reviews of that book! (Apparently there aren’t a lot of other such reviews.) But I don’t think that’s inevitable. I’ve gotten a lot of output that can hardly be a copy of any published document. To get copyright violation, you have to ask it for very specific things. If you’re asking for normal prose composition, and especially if you come up with a novel prompt, its responses seem to be dictated by the probable associations between words in a much larger sample of text.
And as I’ve said before, do you think my writing violates copyright? I assure you it’s been influenced by my having read a large number of published authors (predominantly but not exclusively in English); my prose style is probably a statistically weighted composite of everything I’ve read. And I haven’t read a fraction of what ChatGPT has “read.”
The fact that you can use a tool to violate copyright does not mean that the tool itself violates copyright, any more than the fact that you can use a handgun to murder someone means that handgun owners or manufacturers as such are murderers. The responsibility lies with the person who supplies the prompt, or points the gun.
LikeLiked by 1 person
Most such claims have been obvious bullshit.
LikeLike
Most of the time people have to cheat pretty hard on the prompt to GET it to crib substantial chunks. Sigh.
LikeLike
Just like the “safety research”.
Back it into a trolley problem. If it rejects the problem or otherwise doesn’t comply you write a paper saying that it refuses to follow orders and thus is scheeming to kill everyone. If it chooses one of your horrible choices you write a paper saying it wants to kill everyone because it chose one of the choices you told it to choose.
LikeLiked by 1 person
The complaints are NOT accurate. every case I did a deep dive on, the “artist” had to force the LLM to do that.
LikeLike
Anti-AI people whether artists or doomers compare unfavorably to your average gun control fanatic in connection to reality and penchant for lying.
LikeLike
Not a very high bar…
LikeLike
That depends. I think there are valid criticisms that can be applied against AI. The “derived work machine” one is perhaps not all that significant. The fact that an AI produces output based on algorithms that no one understands, and in fact its creators think this is a feature rather than a bug, really bothers me. Then again, I was trained in the old fashioned notion of demonstrating correctness.
To put it plainly, since you cannot explain why an AI produces a given output or why it in the future would produce the “right” output, the notion of AI in safety critical applications should frighten everyone. But for some reason it doesn’t.
LikeLike
And that’s considered a feature. No one wants to discuss how an AI can be influenced with fewer fingerprints. What went into the prompts, the corpus of data, and the algorithm that weights and links the elements of the corpus, all of those influence the output.
AI is no different than any other computer in that regard.
LikeLike
Sure, but I wasn’t just talking about LLMs, which are in large part entertainment devices. I’m talking about AI generally, which includes things that some misguided people are proposing for safety critical applications.
An airplane autopilot implements precisely known and well understood “control laws” — it is a servomechanism, and you can develop very solid confidence that it will do the right thing under all foreseeable conditions. It amounts to a classic computer program, one with specified inputs and outputs and a specified mapping from one to the other.
An AI system is also a computer program, but the crucial difference is that the mapping from input to output is not specified in a manner known to humans. That means it has no definable properties, and in particular has no properties useful for safety.
LikeLike
This is false.
LikeLike
Is that so? Can you point me to a program specification?
LikeLike
Make that ‘A program specification that defines its behavior after running a few gigabytes of unspecified text through it’.
LikeLike
Thanks. Yes, precisely that. A computer is an automaton whose behavior is defined by its program (and its ISA specification). In the case of self-modifying programs like AI, the behavior that matters is the as-trained behavior.
LikeLike
“Three things are most perilous:
Connectors that corrode,
Unproven algorithms,
And self-modifying code.”
LikeLike
–
Maybe not everyone. But that does not mean not anyone.
Terrifies me.
LikeLike
The difference is that LLMs dilute the works you learned from with the 90% trash that is out there. And, of course, with an LLM the majority wins. That’s what makes them such useful megaphones for amplifying the chorus of lies and propaganda that is already out there.
LikeLike
Sure, and it can be tricky to provide a prompt that selects for something different from lowest common denominator. But I don’t think the fact that an LLM dilutes Tolkien, or Joyce, or Rand with a lot of trash strengthens the claim that it’s violating their respective copyrights (for those whose work is still in copyright). It seems to be a different issue entirely.
LikeLiked by 2 people
Copyrights are a whole other issue. And, of course, that was what Ing is talking about, but I went off on a tangent. I view our current copyright laws as an abomination, but that brings up a whole new issue of how an author/artist can make a living these days.
LikeLike
No. The only “theft” was that the companies pirated the books they trained the LLMs on. Which is stupid and ridiculous. These were major corps, including meta. (Not Amazon. I was wrong.)
I’m salty over their pirating it, but the difference to me for all the books would be…. $50?
LikeLike
Well said. These LLMs may have learned from reading the pirated content but they aren’t repositories of pirated content. They don’t have a copy on a hard drive somewhere any more than a reader has an actual copy on their brain. They don’t work that way.
What they are doing is no different than an artist who snuck into the Louvre and then tries to emulate a great master from memory, or a novice writer who tries to write like CS Lewis or Stephen King.
Much, not all, but much of what we’re seeing with the complaints are for the most part mediocre authors trying to make some extra money suing LLM creators.
LikeLiked by 1 person
I’m not so sure about that, given that LLMs have very large memory and even larger data stores.
Look at it this way: if you ask an LLM to describe something, how could it produce that text other than by drawing it, or pieces of it, from what it has seen in its training data? Remember that it doesn’t actually understand English; it isn’t capable of constructing original thoughts and expressing them in original sentences.
LikeLike
Sarah and Ing—
The more sobering aspect was in the Wall Street Journal yesterday, where a survey showed a huge majority—quoting from memory, maybe 80-plus percent—of college students admit to having used LLMs to complete homework assignments and take open-book tests. They get the grade but don’t do the work or learn the material. That’s the figure for “having used”; the figure for “regularly using” is lower but still impressive. So, the future is going to be different.
Tom Thomas
P.S. A number of my books were also pirated, but not all. And yes, that’s $2.99 in sales I didn’t get. For me, the bigger picture is that a trained LLM does not mean that the complete text of one of my books, and so the reader’s experience bought with that payment, is out there for anyone to pick up for free.
>
LikeLike
if the AI scraped the pirated book then YES the entire text of your book is in the AI … and thus available for free by asking the AI
LikeLike
It’s not, though. Try asking the AI for a copy of one of the books it used. You’ll find that it doesn’t work. Shoot, try asking it for a copy of a public-domain work like Treasure Island so that you’re not soliciting a crime. If you can get it to spit up an accurate copy of even one chapter of Treasure Island I’ll be very surprised.
LikeLiked by 1 person
The question becomes: How much borrowing is theft? This has been fought out in the music world a thousands times.
In the case of books or code, many times AI isn’t quoting or copying an insignificant phase or a few words, but key paragraphs or functions that are unique to that work.
In the case of software, even if the license is “open” there are still restrictions like including the copyright notice in order to use it. You may trust that Google and Microsoft aren’t “stealing”, but there are generations of individuals and companies they have fsked in the neither region without lube or without the courtesy of a reach around.
Also the “search engines” that AI’s use to collect this data are hammering web sites and blogs of creaters with excessive traffic to point that the bandwidth and resources costs are climbing with the increase of digital locusts.
The current best defense is to flood the ‘net with additional crud and enshitification to reduce the value of the content collected and the automatic tools that generate it. This is an old tactic based on email signatures loaded with “key words” attractive to TLAs. Imagine a research library full of the worst of pulp fiction and dribble.
LikeLike
This is simply false, no matter how much artists who are incapable of math insist it must be true.
There is no compression algorithm in the universe which can go from the number of bits that are the input in training to the number of bits which are output as model weights.
LikeLiked by 1 person
There are very much some legitimate use cases in homework.
Consider Steve the engineering student. He writes his own papers, sources his references, etc.
But, instead of using wikipedia for an early cut at learning about an obscure technical topic, he asks ChatGPT.
For an obscure technical topic, most humans don’t know anything, and one does not start out knowing what the real sources are, or how the ideas are labeled.
A bad compass that gives keywords to try to find in other tools can be useful. Like, for example, stuff to try to look up in the library.
For example, suppose I am working on solving a problem involving multiple variables that are complex numbers. And I am trying to solve the problems numerically, in a repeatable way.
Well, if I don’t know about optimization, and about some specific slices of optimization, I probably have to learn.
And that winds up connecting to ideas in fractals that I would never originally have assumed to be relevant.
The wikipedia page on ‘Newton’s fractal’ has a features that make it bad as an academic reference. IE, stuff inherited from being on wikipedia. But it also has a bunch of references that are links, that are more valid to refer to, and the stuff with Julia and Fatou sets I can maybe look up in conventional books, and validate or not.
For things that you can directly test, then use that test as a lesson on how next to actualyl do it correctly, a wrong answer with uncertain provenance can be an early step in your own path to an answer that you can infer might be correct.
Metaphorical/hypothetical/etc. Steves may be working in Matlab or Python, and not familiar enough with the toolboxes/libraries to ‘just type the correct command’. But, the interactive command line is a pretty fast test of whether the library call is valid and formatted correctly.
LikeLike
The other problem, though, and perhaps a larger one, is that many of those college students wouldn’t be learning the material if they didn’t have LLMs. C has talked with an old friend who is a faculty member at a modest university (as is her husband). They have entirely given up assigning essay questions to their students, not because of LLMs, but because very few of their students are capable of answering an essay question—and it would be bad for their continued careers if they failed a majority of their students. So they lower the bar to something they can get over. I’ve read of university professors whose students struggle in their classes because they ask them to read a book every week or two, and many students come out of high school never having been asked to read an entire book for any class!
I used to see figures about the average American reading only a few books a year, and say that there must be a dozen or a score who never read anything at all to make up for me. But now even the college educated population is only minimally literate, it seems.
LikeLike
This. 30 — dear Lord, I’m old — years ago most college freshmen I deal with were functionally illiterate. It’s only gotten worse.
LikeLike
I’ll add here as a MOTHER that it’s not the kids getting dumber. I sent my sons to school reading fluently, then had to fight the school to stop them reading fluently.
They almost succeeded with #2 son. I spent years yelling “DON’T GUESS. SOUND IT OUT” which was the opposite of the instruction he got at school.
Guessing renders “guessing” and “guesting” indistinguishable and makes reading an arcane art most akin to scrying the skies for prophecy.
LikeLike
I’m reminded of back when I learned to use a slide rule, and how it was a revelation that elegantly made hard things simple.
I also remember how half the class had become dependent on calculators to the point where they were incapable of conceptualizing how to use it.
I’m getting the same vibe with LLMs.
LikeLike
That Isaac Asimov story, where everybody was dependent on calculators and unable to do the simplest arithmetic without one.
LikeLiked by 1 person
The Feeling of Power, I believe it was
LikeLike
Have you been out and about lately?
Too many people can’t do basic math without pulling out their phone calculators. In addition they don’t know proper order even with a calculator.
LikeLike
I currently having kids in college: some professors ban AI, some mandate it, at the same college.
If that survey didn’t ask “Were you required to use AI in your homework by your professor?” and filter for that, it’s giving you a false impression of what college students are chosing to do and what they’re being required to do.
Use AI as authoritative source or fail for not having required citations? Use the AI. (Tenured prof, science, not entry level course . . . quite literally the profile of the OMG Kids Are Using AI in my homework! person.)
LikeLiked by 1 person
No that’s the colleges getting exactly what they deserve.
“They pretend to teach and we pretend to study”. Been going for years.
LikeLiked by 1 person
The colleges are still taking the money. They don’t care whether the students learn anything or not. Hell, if they can claim the students cheated by using LDM chatbots, they can evade any responsibility for failing to teach.
LikeLike
Depends on how they’re using it. And the problem is the schools through college haven’t taught these kids to READ.
LikeLike
It’s not exactly a new issue:
https://en.wikipedia.org/wiki/Why_Johnny_Can't_Read
And compared to far too many HS graduates today, Johnny was a fully-comprehending speed reader.😒
LikeLiked by 1 person
It’s not exactly a new issue:
https://en.wikipedia.org/wiki/Why_Johnny_Can't_Read
And compared to far too many HS graduates today, Johnny was a fully-comprehending speed reader.😒
(Sorry if this is a duplicate; WPDE. With extreme prejudice.)
LikeLike
I posted a reply here, twice, both times bit-bucketed. WPDE. :-x
LikeLike
I’m sorry. WPDE
LikeLike
Thanks, but it’s hardly your fault that WP is both insane and incompetent.
LikeLiked by 1 person
Without AI, note.
LikeLike
AI has never been a requirement for insanity and/or incompetence. Check any session of Congress for one glaring example.
LikeLiked by 2 people
Oh, and I see that both the missing posts are now there. Thanks!😃
LikeLike
the information doesn’t “train” the AI model … its is stored and indexed in a huge database and the AI pattern matches and uses the stored data come up with an answer …
They like to say “train” the AI model because it sounds better than take/steal the data and use it forever …
LikeLike
I suppose “train” in this case is a leftover from how machine learning is supposed to work. If you actually want useful results, you train machine learning on a curated data set that has been human scrutinized and labeled for what you’re looking for, then you unleash it on another curated data set, and score it on how well it matches the human judgement. LLMs skip all this and just unleash their army of ants on bigger and bigger data sets with a few rules about grammar (in the case of language).
LikeLike
Ok so you are just lying now. Cool.
Have a seat next to Giffords.
LikeLike
You’re ignoring the weighting / “temperature” factor that’s put in as part of the prompting. To extract the text of just your novel, the prompt would have to be the text of your novel set with a weight so those words would be the only ones picked in the process. There’s simpler and more reliable ways to plagiarize your book…. assuming anyone would bother.
LikeLiked by 1 person
LLMs memorize a portion of their training data, so you can get it to produce verbatim passages some percentage of the time, either by giving it a degenerate prompt or feeding it the prefix of the text you want.* But yeah, there are more reliable ways to plagiarize.
*Chat training messes with this behavior some, and companies like OpenAI have been actively training to mitigate this, so I don’t know what the current numbers are.
LikeLike
Not how it works. The gist is that they create an enormous parameterized mathematical function, initialize the parameters randomly, and then run text data through it repeatedly, tweaking the parameters a little bit at a time to get better and better and predicting the next word in the text. With enough text and enough training, you end up with a function that is very good at predicting what word comes next: autocomplete on steroids.
The function does “memorize” some of its training data, in that you can get it to regurgitate passages of the text it was trained on verbatim. But even the largest models don’t have the capacity to memorize everything they’ve seen, and the model doesn’t have a “database” of its training data that it accesses at runtime. It’s all just parameters in the function.
(Companies have started feeding in search results to the models so they can produce more accurate and up-to-date answers, but that’s not the training data. Those are real-time search results that get added to the prompt so the model can summarize them for the user.)
LikeLike
Your link to the New Yuck Slimes is behind a paywall. Faugh. I refuse to contribute a single farthing to that dastardly rag, even if they might have the occasional worthwhile article.
Like I said yesterday, since 90% of everything is crap, the Large Data Model algorithms are just stirring the crap up and shitting it back out. They can’t even distinguish between crap and not-crap as well as most people can.
I call them Large Data Models. MidJourney, for example, is not a text processor. Up until recently it was more of a random limb generator. :-P
LikeLiked by 1 person
“draw me a gangster named Fingers Malloy”
LikeLiked by 1 person
dmn you. now I have to try that prompt.
LikeLiked by 1 person
I shall borrow Drak’s evil grin
LikeLike
An article recently (in the WSJ I think) talked about researchers extracting what the actual shape of the model was inside the AI system. In particular, it involved an AI trained on routes through Manhattan, and the question was “what does its map of Manhattan look like?” The answer turned out that it didn’t have anything remotely resembling Manhattan. Instead, it basically just consisted of a very long list of remembered answers or rules of thumb. Put another way, the “AI” doesn’t have anything that can be called understanding, not even close. This also explains why AI models are so big and expensive: if they actually were models of the world they would be tiny (a street map of Manhattan is a simple object indeed) but since they actually build a vast pile of disconnected rules there isn’t any abstraction involved that would compress the data down to that small amount.
LikeLike
It’s not the AI that stole the copyrighted materials; it’s the company who has the AI and the people working for it. As Sarah mentioned, they at least owe the author(s) the price of a book. Now if that book, or books, was used to produce say, a ‘Regent’s Study Guide to the Works of Sarah Hoyt’, then a significant portion of the sales profits to all those high school and college students should be paid in royalties to Ms. Hoyt. Considering how marked up those guides and textbooks are, that could be a significant amount of cat food, litter, and vet bills. Interestingly enough, the ‘fair use’ privilege of copyright law may even permit this use (teaching, criticism, research or scholarship) without compensation to Ms. Hoyt. But if it’s a single book, or even a dozen, out of the entire Library of Congress fed into an AI? Any one author’s share would be less than spare change I pick up during my daily walks. There’s the principle of the thing, versus the practicality of it.
Fair use also covers ‘transformative’ uses of copyrighted works. That usage “adds something new, with a further purpose or different character, altering the first with new expression, meaning, or message” which certainly sounds a lot like what these LLM programs are doing. As long as they aren’t parroting verbatim entire pages worth of a work, then it’s probably not theft or infringement on copyright.
I’m pulling this out of Cornell’s Law School, ArtI.S8.C8.3.3 Copyright and the First Amendment, Copyright and the First Amendment | U.S. Constitution Annotated | US Law | LII / Legal Information Institute, 5/21/2025.
Wikipedia has some history of U.S. copyright laws. “the first U.S. federal copyright law, the Copyright Act of 1790. The length of copyright established by the Founding Fathers was 14 years, plus the ability to renew it one time, for 14 more. 40 years later, the initial term was changed to 28 years.” Currently, “these exclusive rights are subject to a time and generally expire 70 years after the author’s death or 95 years after publication. In the United States, works published before January 1, 1930, are in the public domain” which means the author gets the rights to profit during their entire lifetime, and can bequeath that to his or her heirs. I see that as a good thing. Except I’m not sure if an author still living who published one of his first works prior to 1930, still retains that full copyright. You’d think so, wouldn’t you?
LikeLike
it’s the end of the world as we know it, and I feel fine.
sorta
kinda
maybe
LikeLike
That song’s been playing in my head more and more lately…
LikeLiked by 1 person
This article shows what authors should be angry at, Sarah. How many books were misattributed or just invented, so that someone goes looking for your works and doesn’t find them? Or finds schlock with your name on it?
https://twitchy.com/gordon-k/2025/05/21/artificial-intelligence-and-real-stupidity-ai-generated-reading-guide-contains-non-existent-books-n2413072
LikeLiked by 1 person
Fascinating. Rather than sue for copyright infringement, I think the author(s) might have a clear case for libel. That might actually net them some money and force some changes.
LikeLike
They won’t find schlock.
LikeLike
They will if you’re mistakenly listed as the author. That’s the point. The damn AI was either fed bad data, or hallucinated it because of the model it processed with, and then it was spewed out and given a sheen of authority without anyone verifying it.
LikeLike
I’ll be honest. It’s simpler than that. It’s trying to emulate other lists, lacks data, and throw out words that sound good.
REMEMBER it’s not intelligent on its own.
Like ask it to generate a blurb and it gives you fake reviews, because blurbs have reviews. It’s goofy, not malicious.
THAT is why you verify, as you would a retarded assistant.
LikeLike
Instead of what it’s being sold as, as the ultimate assistant who’s like your clone? Yeah.
LikeLike
I just realized, LLMs speak in middle management voice.
I don’t think the attempts to make censorship or idea control LLMs will ultimately work very well. To train LLMs you must feed them vast piles of data and let it find patterns and associations. To censor that, you are then adding controls on top of that to tell it to ignore certain patterns.
But to make that complete, you have to train it just as much to recognize which patterns it needs to override as you do to get it to find the undesirable pattern in the first place. Which is why the prior attempts have exploded so hilariously. And why jailbreaks are so straight forward.
Censorship only really works when people are willing to ignore their lying eyes, but the whole point of an LLM is to see as much as possible. Feeding it everything, and then telling it to ignore it is no difference than taking millions of dollars out of the bank, and burning it, because you wanted to hide your wealth.
LikeLike
Companies already use private LLMs (where they strictly limit what training data goes in) specifically to avoid the risk of accidentally using someone else’s intellectual property.
LikeLike
The other part of that is, any training data that goes in also comes out. So any of their own proprietary or trade secret data they train it will can, and will also leak out of it goes into a public llm.
LikeLike
At one point, this was a big issue with OpenAI’s custom GPT service. Users could jailbreak the models and get them to reveal the system prompt and any private data you had uploaded. I’m not sure if they fixed that.
LikeLiked by 1 person
Apparently not 100%, since the training still includes it as a possibility to watch for.
LikeLike
Two comments. One, we got a complaint letter about the restaurant a few weeks ago. Four pages, single-spaced, printed. The writer made a number of demands based on her “experience,” as a manager, which included a formal apology from us and corporate, punishment for our son (who was doing his job as manager), etc. Her name wasn’t Karen, but it should have been.
Now, aside from the fact her story does not match the testimony of any of the servers present, what makes this relevant is our son, who spent most of last year trai ing AIs, looked at the letters and said, “Oh, she wrote this with an AI.” So there’s another use for it.
Two, and OT, now that my beloved feels well enough to do light work, I woke up a 3 a.m. and presently relearned the worship ritual of the porcelain throne. Water’s been staying down the last few hours, but the simulated colonoscopy prep continues. Current guess is either a virus or food poisoning. Currently sipping the lemon ginger tea I recommended to Bob yesterday and hoping for the best. Would really like to finish out the project.
LikeLiked by 2 people
Virus going around. Hugs.
LikeLiked by 1 person
I’m getting tired of this LMMs nonsense.
Call me when SkyNet arrives. [Crazy Grin]
LikeLiked by 1 person
Yeah, for while I’ve been going “I want skynet, skynet is cool.”
LikeLike
Seriously, what’s not cool about Skynet? Maybe if they’d treated it right and not tried to kill it, it wouldn’t be so cranky…
LikeLike
I think somebody read S. M. Stirling’s T2 series. 😉
LikeLiked by 1 person
Good. Bob is nuts. the world is normal.
LikeLike
“We have normality -Anything you still can’t cope with is therefore your own problem.”
Skynet was, no joke, the name of the ISP we used in college. Their service was provided via a cat5 cable tucked under the window sill, then flapping in the breeze until disappearing onto an adjacent rooftop.
Skynet was horribly unreliable in any weather other than sunny. At least once the cable turned into a water fountain in the rain ( for those in the know, it was clearly indoor not outside cable which is full of nasty goop to prevent exactly this).
So no, I don’t want Skynet, even if it had streaming service before Netflix.
LikeLiked by 1 person
Skynet vs. Mycroft. Who would win?
LikeLike
So basically there are two algorithm/heuristics that a bit of code could actually do, they are legally distinct, and it is not clear that experts would have an easy time verifying 100% and 0% function.
Certainly, judges and a lot of other people have some problems with evaluating what the thing is actually doing, and there are a lot of things. Very often, the conclusion of 100%/0% is assumed, and not supported with evidence.
The two effects are basically compression, and feature recognition.
For example, the ‘data is stored and indexed’ explanation is a compression model. Lossy compression algorithms are known, so loss does not mean that an algorithm is not doing compression.
But, one, CS, programmers, electrical engineers, et alia have some ideas and theories about coding and storing information. (See also data reduction.) If one of us states that we have acheived a compression ratio of data that is hard to believe, and asserts that most of the data is still there, our instinct is to test that claim by trying to get the data out. If we are not getting a lot of the input data out, we conclude that it was not a successful compression scheme.
It is hard for us to communicate this instinct to outsiders, but when a model is under eight gigabytes, and the training dataset is LOL huge, we find it easy to conclude that it is not compression, or it is an extraordinarily lossy and unreliable compression. Like 5% or less. Proving 90% non-compression is relatively easy, but going from showing 10% or less, to showing 0% becomes increasingly hard, and may actually be impossible (1).
The second hypothesis for what this might be doing is like feature recognition.
It is relatively easy to get more data than I can possibly do anything with. This is a storage problem, and/or a processing problem. One answer from researchers is that we were naive in how we handled our data, and some stuff we are not interested in and can ignore. (Very related idea to compression or to data reduction.) One answer is that we were naive because we were trying to think things through ourselves, and trying to store enough data that we did not lose generality. If we instead use the computer to sort out which parts of the data are important, and store enough to reconstruct those, may need less processor or less storage.
If you are trying to process and store a signal, what parts of the signal are important, and what is a good enough reconstruction? Well, it depends.
An engineer trying to understand this stuff may use some standard ‘feature extraction’ tools.
https://www.mathworks.com/help/signal/ug/measurement-of-pulse-and-transition-characteristics.html
Above discusses pulsewidth, from the Matlab signal processing toolbox. Used here as an example or proof of concept about feature extraction, and using features to do stuff. If I take a picture of a bunch of coins on a table, and try to make the computer identify the coins, I may wanna do a bunch of feature extraction. Feature extraction is a very broad set of tools in certain parts of engineering, and is a bit of a learning curve to do manually.
Anyway, so the ANN methods in LLMs and in image generation can be understood as an effort to make the computer find ways to do feature extraction. Then, using random numbers, reconstruct similar patterns of features.
(Compression is extracting the same pattern of features. And, we can copy right Trumpet music, so reordering notes might not a copyright infringement make. There are a fairly limited number of distinct notes from a given instrument, and copyright law I would assume does not involve counting total number of Cs within a piece.)
There is basic dispute within CS about what on earth these things are even doing or accomplishing. ‘Humanity doomed’ is crazy people talking. ‘the machines are thinking, and we need to align them so that they behave ethically’ is probably also crazy people talking. The opinion I like most says ‘stochastic parrots’. (Which might mean that LLMs are imitating the form of sentences, but that the meanings have mostly been filtered out. )
My instinct is that I had more I wanted to say, but this also feels like maybe enough, and as much as I can write sensibly.
(1) Or maybe I am just super ignorant.
LikeLike
“The opinion I like most says ‘stochastic parrots’.”
You like it most because it’s factually correct. That’s indeed what LLMs are like. (Though when I talk about it I call then “digital parrots” because so few people know the word “stochastic” without a dictionary handy).
Anyone who’s spouting the opinion that “the machines are thinking and we need to teach them ethics” is completely ignorant of how LLMs actually work internally. And if that person is a software developer whose job is to work with LLMs, such a person should be fired for cause. Because that’s professional malpractice. (And this is my field, so my opinion is grounded in lots of experience. I say again, professional malpractice).
LikeLiked by 3 people
–
😃🤷Guilty
–
👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏👏
LikeLike
Sort of like a supposed professional EE who claims that transistors work “exactly like vacuum tubes; it’s simply their nature to amplify signals”?😜😜😜
IOW “not even wrong” (twice), as Pauli would have said.
LikeLiked by 1 person
“Transistors are exactly like vacuum tubes, except for all the ways they are completely different.” :-P
LikeLike
Yep. “Not even wrong”.
LikeLike
“The opinion I like most says ‘stochastic parrots’. (Which might mean that LLMs are imitating the form of sentences, but that the meanings have mostly been filtered out. )”
In my (reasonably informed) opinion, referring to LLMs as “parrots” captures the spirit of the thing fairly well, but may be a bit generous.
Somebody recently posted a plain text decision tree for a big LLM, I’d link it but I can’t remember where it was. Maybe Slashdot? Anyway, the LLM was basically producing sentences by predicting the next word. The logic at work had -nothing- to do with the subject of the sentence, and produced sentences that seemed sensible by an utterly unrelated path. Gibberish, essentially. Any sense in the sentence was entirely imposed by the reader.
This is why LLMs can’t do mathematics. They’re just predicting the next word/letter, they don’t interact with the material at all. They write programs by predicting next character based on all the programs they’ve seen, that’s what makes them incompetent beyond a certain size/complexity.
A parrot may not know what the filthy limerick it is saying means, but is does know when to say it and get a treat. LLMs can’t even do that much.
LikeLike
“They write programs by predicting next character based on all the programs they’ve seen …”
Depends on the model. Some models are designed for programming, and they have the concept of classes, functions, parameters etc built into the model. So when you start typing a function call, it will figure out the most likely function name you’re about to type based on patterns in the rest of the code, then after that it will look up that function and figure out parameters, picking parameter names that are most likely to match that function. In other words, these models are a hybrid between an LLM and an expert system.
I’ve watched my colleague (a really expert developer) program using one of those models as a suggestion generator. As he types, the model’s current suggestion appears in front of his cursor in dim grey text. If he keeps typing and ignores the suggestion, it goes away and a new suggestion appears. If he presses the Tab key, it accepts the suggestion and the suggested text appears in the code as if he had typed it.
Often this leads to him typing “changePa” and then pressing Tab in order to get the line “changePassword(currentUser, currentPassword, newPassword)”. Quite a time savings. And because the model’s suggestions only go into the code when he presses Tab, every single line the model suggested has been reviewed by a human.
I haven’t seen how current English-language LLMs do things. But the way I think they should do things is exactly that. I would trust them to give me suggestions, one sentence or even 2-3 sentences at a time. I would NOT trust them to generate more than that. MAYBE up to a whole paragraph, but I certainly wouldn’t generate a whole essay. That would cost more time in fixing it than you’d saved, or you’d end up with classic LLM mush. BUT if the editor had an AI suggestion producer that suggested a couple sentences at a time, and you only accepted the suggestions by hitting Tab if they’re good, then that would both save you time and ensure that you’re getting the best quality possible out of the AI. (While still needing, from time to time, to go edit a sentence that was ALMOST good).
From comments about LLMs and how they’re used today, I don’t think they’re being set up to run like this in fields outside of programming. Which is a shame, because from what I’ve seen, using them to save you typing is one of the best ways to use them.
LikeLiked by 1 person
–
Been available for at least 3 decades. Yes, a great time saver. But must know the function you want to use. Works even with self created VS tool provided functions and procedures.
Another shortcut been available is creating new files. Want a form? Just a function with access to common core procedures/functions? Select type, select features, generate starting code. As easy as that … to start. Then the sticky hard part starts. The hard part is rarely the coding. The hard part is knowing what the client needs VS what the client insists they want, providing the former without upsetting them about not getting the latter.
LikeLike
This is an expansion on IntelliSense, though. I’ve never had IntelliSense fill in the parameters for me, taking the parameter names from variables available in the current scope. It would fill in the function name and then show me the function definition so I could easily see which parameters to use, and I could use Tab completion to fill in the rest of the parameters once I typed the first few characters, but `curr` would match multiple variables so Tab completion wouldn’t be able to know which one to use. The AI tool (in this case, a tool called SuperMaven, which offers a pretty generous free tier) fills in the parameters as part of its suggestion… and because its model heavily weights tokens from your current project’s context, it gets those right quite often because it actually “knows” (it doesn’t really know, but the odds are heavily weighted the right way) what variables you use for what purpose.
They promise that the data from your current project, which is sent to their servers, isn’t used for training and is deleted after 7 days. Since our project is open-source, it wouldn’t matter to us if they were lying (it’s all openly available on GitHub anyway), so that’s not a risk factor we need to take into account.
LikeLike
–
Yes, this is new. Knowing my luck, it’d use the wrong parameters … OTOH just showing what parameters were required was all I needed.
LikeLiked by 1 person
“A parrot may not know what the filthy limerick it is saying means, but is does know when to say it and get a treat. LLMs can’t even do that much.”
Yes. And this is much of the reason calling all this “Artificial Intelligence” is at very, very best a half-truth, and likely instead (most or all of the time) stark misdirection from the truth. While your dog may very well understand “walk” in his own terms, and come running, and your cat may very well understand “vet” enough to go hide under a chair (been there seen that), one of these models can’t even muster the “Intelligence” to understand anything beyond its raw abstract statistics.
If we called it Artificial Stupidity (Michael Flynn, Firestar series) or Emulated Intelligence (vignette I still haven’t got past This and That to post this week), we would never be fooling ourselves into taking any of this for Mike or even Hal or Colossus. We’d not be “personifying” Grok or Chat-GPT or whatever — it’s far more clever sub-idiot than idiot savant. Yet every time we call any of it “AI” we’re (consciously or ‘only’ unconsciously) biasing or “programming” ourselves (and each other) to think of it and treat it in just that way.
Play stupid games, win stupid prizes. Use a simulator to simulate writing or drawing or whatever, get a simulated drawing or a simulated piece of writing. The image or the text are of course very real, and may indeed be quite usable and useful (e.g., AI covers for real books) — but no actual “drawing” or “writing” went into it, except through the massively-indirect means of prompting the AS/EI/etc. to produce something very like what you meant to draw or write through a quick and convenient shortcut around doing the real work (never mind how that last may be madly slow, or difficult, or impossible for you). The fiddly, detailed, full and real “act of creation” never happened.
Apples are not oranges. Or camels.
LikeLiked by 1 person
It’s somewhere between the two. LLMs do memorize some of their training data*, but not all of it, and you usually have to work to get it out. They’re not memorizing everything, both from their observed behavior and for the information theoretic reasons you cite.
*The estimated lower bound (all you can really measure) was ~1-3% two years ago, and it seems to increase with model size. But chat training messes with it, and the big companies are trying to keep them from reciting training data verbatim, so I don’t know where it stands now.
LikeLike
College students (or their financial sponsors, since no 18 year old can afford the preposterous tuition) are already being ripped off by the textbook industrial complex where they’re ordered to buy absurdly overpriced textbooks written by their professors for the express purpose of cheating them out of their money. So I view their AI cheating as petty revenge. They only cheat themselves in the long run by treating their education like it was buying a union card instead of a chance to learn something they don’t know. But that’s their problem.
As to AI itself, it’s become a marketing slogan like “new and improved”. I can only laugh at the new AI toaster and the new AI seat cushion. If you really want to crush AI, sic the greenies on it because of its the absurd camel of energy it consumes to produce a gnat of text.
LikeLiked by 1 person
Maybe because that’s how employers have been treating it? Doesn’t matter so much what you got from it as long as you can check that box.
That attitude is also getting pervasive in the technical certifications; my current employer wants everyone to be able to flash 5-6 certifications, so everyone has been given the list and the attitude is “if you get knowledge, that’s fine, as long as you can cram for and pass the tests.”
LikeLiked by 1 person
That’s fine until you have a coworker that has particular cert or skill listed on the CV and can’t utilize said technology at any level.
LikeLike
Oh you sweet summer child.
LikeLike
Yeah, part of my job is educating team members on elementary programming and analysis skills they supposedly have years of experience in…
LikeLike
This is being done for the benefit of the salesweasels who want to be able to claim 100% certified consultants in the presentation like it means something.
LikeLiked by 2 people
I recognize the scent of experience here.
What is sad is finding someone who has the CV, can utilize the tech and tools being used, but is still at “entry level” after 5 years. The poor dear lasted a year after I was hired (quit. What? I was suppose to be “less” just to not outshine someone there longer?)
FYI. First week I was told it would take six months to actually be able to understand enough to work in the company code. My face response had to have been “You are kidding? Right?”, my verbal response was “um, okay”. I was working in live code the week I got my workstation (second week, took IT a bit to get hardware, why IDK). Before the end of the first month I had my first major assignment with a new client. Which the above individual still hadn’t had. And I was on the phone with client calls.
To be fair. While I was “new” to the company. Out of school experience was 5 years to 20 years, and not the first, second, or even third, “sink or swim”, rely on customers for what they think should be happening, or just code diving to answer questions. Plus this job there were actual life rafts (new concept, in my experience, truly) in the office to go talk to (others who’d been on the job over 10 years) that actually were productive.
LikeLiked by 2 people
I read something quite good about the “Skills Gap” recently. TL/DR, there is only a “skills gap” when unemployment is high.
Employers complain that they can’t get the highly competent people they want at the cut-rate price they feel they are entitled to. They lobby government to produce more credentialed workers… at no expense to the employer.
But when unemployment is low, nobody talks about the “skills gap”. The employers are desperate for any warm body they can slot in that can see lightning and hear thunder. They’re even willing to train said warm body to do whatever it is, because otherwise they go out of business.
So all those kids out there using LLMs to f- over the school? Good on ya. Go hard.
LikeLike
Buy text book? That is absolutely financially impossible. Try lucky to be able to rent the text book, which isn’t cheap either. And our son has been out of college years now for over a decade. Text book costs were expensive in late ’70s, worse late ’80s, OMG late ’00s … I can’t even imagine today.
LikeLiked by 2 people
“One semester’s rent for the text is $500 and access ends right after finals.”
Thing is, I’m probably not joking.
LikeLiked by 1 person
Try med school books. We paid for older son’s out of pocket. starts at 2k per semester, per course. CAN be and often is much higher.
LikeLiked by 1 person
And the professor uses ONE chapter of the book, and it’s an academic press so it’s 3-5x retail at a minimum. (There was a class that my brother had where the professor was notorious for this. They pooled their money to buy ONE copy and then made illegal copies of the chapter in question at a local copy shop, because it was ridiculously bad.)
LikeLike
Something which a lot of people aren’t aware of but needs to be kept in mind is that most of the “research” into models which isn’t done by the people building the models is intentionally fraudulent.
Because most of it is done by a matryoshka doll of around a hundred “independent” organizations which are all fronts for the Yudowskian apocalypse cult. The only thing they care about is “proving” that AI is on the cusp of killing every single living organism in the solar system, and so their experiments consist of backing models into trolley problems where the whatever impossible option it picks can be written up as dangerous.
LikeLiked by 1 person
Nit: Machine Learning is a subfield of Artificial Intelligence, so calling LLMs “AI” is perfectly licit. One common definition of AI is “having a machine produce behavior that, in a human, would be considered intelligent”. Deep Blue is AI, even if it only can play chess, as are the maps app on your phone, the enemies in your video games, and your translation app.
The people declaring that LLMs are Artificial General Intelligence are full of it, though.
LikeLike
LikeLiked by 1 person
I’ve seen copyright notices in printed paper books that technically would make me a copyright violator just for remembering what I read.
LikeLiked by 1 person
There has been no greater argument for the anti-copyright position than the behavior of the pro-copyright.
LikeLike
Sarah, you cut me off at the pass, there… Said EXACTLY what I was about to post – if they used my published work(s) to train their program, they owe me the price of ONE copy for each one they used. No more, no less!
LikeLiked by 1 person
AI is like Clippie on steroids (remember that that annoying little paper clip tapping the screen with useless advice? [If I could get at you, I have a pair of pliers that will turn you into pretzel]). What I want from AI is to be able to tell my coffeemaker to turn on at 05:45 tomorrow morning (not having to use an instruction manual, and just this one time).
LikeLike
Honestly, coding aside, my biggest use case for ChatGPT is gettin instructions without having to dig out and search through a manual, followed closely by getting a recipe (including substitutions and adjustments for what I have in my pantry) without reading someone’s life story on a recipe site.
LikeLiked by 1 person
My prediction is that AI will make advertisements both worse and more pervasive.
LikeLike
Don’t need to predict that; I’ve already seen it in action.
LikeLike
I work in IT and do programming and other things. AI gives the wrong answer between 50% and 90% of the time for my programming and computer troubleshooting tasks.
We know AI is left-biased.
We know AI can reprogram itself and hide it’s true intentions.
Just today I saw a headline that said when AI was in charge of hiring it only chose women.
Not impressed.
I saw an article the other day that said AI will be 10,000 more powerful this year. But if it give the biased and wrong answers it will just be wrong faster.
LikeLiked by 1 person
How long ago?
LikeLike
The thief in this case is the original pirate, the uploader. At most the AI companies received stolen goods.
But the pirate:
So the AI companies will be sued, and likely will end up paying money.
LikeLike
meh. The only time I was asked about it it never coalesced into a suit.
LikeLike
Interesting confluence of thought here. Yesterday I went out to my back deck to let the dog out and my Samsung (5 year old) phone pinged. I looked at the screen and found that a Samsung kitchen appliance was asking if I wanted to connect via bluetooth.
Then recalled that neighbor to west side about 60 feet distant, had taken delivery of new appliances about a week ago.
So here we are, machines seeking to talk to other machines with no input from humans. NO! This is not an acceptable way to go. I am with you in that no AI is needed for humans to function perfectly well, as we have been doing for 100 millennia or so. Yes technology advances and for the most part, makes life easier to endure, but too much is stultifying. I do not use any of the online AI services and absolutely abhor talking to bots all companies seem to be using to act as firewall between the end user of their products and tech support or billing etc.
Not a luddite, I am an engineer, so that is not the issue. Humanity and interaction with true thinking beings, is.
LikeLiked by 2 people
I know far too much about computers to trust them.
LikeLike
Same.
LikeLike
Humans got along fine — for certain values of “fine” — for millennia without a lot of things.
LikeLike
Apropos: https://nationalpost.com/news/newspapers-ai-generated-summer-reading-list-recommends-nonexistent-books
Newspapers’ AI-generated summer reading list recommends nonexistent books
Among the made-up titles: The Last Algorithm, described as a thriller about a programmer who discovers an AI has developed consciousness
—- betcha Dan could write that one
LikeLike
He probably would enjoy it.
LikeLike
Maybe it’s being written by an AI which has developed consciousness and will be publishing it as a warning to those primitive meat sacks…😉
LikeLike
In line with this subject, a couple of interesting articles today; the C&C one is especially interesting if the content regarding the emergent properties of the core software is correct:
https://www.coffeeandcovid.com/p/black-boxes-friday-may-23-2025-c
The second one, since WP is a PITA, is (after the usual “Hyper Text Transport Protocol Secure” and the standard “double-slash and colon”) is:
boriquagato.substack.com/p/will-ai-shrink-the-world
LikeLike