I guess I'm asking is AI just a big fancy (semi-random) generator but with added data scraping?
It is a good question.
No one knows what an AI does, seriously… we know how to create one, we know how to train it (reward it for good results, so it can adjust its internal logic to more reliably produce results we like), but how it actually determines what output to show us for a given input is a black box, no one knows how it gets to its results.
This is not a good answer. It is a false assertion, completely and entirely
incorrect. I'm sorry, but you are just wrong.
So, let me try to give a layman's description of the tech currently under discussion....
Random Generator:
We probably all know this, but I will lay it out for contrast.
In a random generator, I create some list of outcomes, assign some probability to each, and then generate random numbers to pick from among those possibilities. Treasure tables in our DMG are a fine example.
Sometimes we can attach some logic to the process as well - like in a random dungeon map generator, room exits and entrances has to match up and be connected, and such.
But in all of these things, some person has directly created the list of outcomes and logic.
Generative AI:
The first thing to note is that "AI" isn't one clearly defined thing. It is a broad, non-technical description. What folks are up in arms about is
generative AI, which is a specific technology. Other things we call "AI" may not work in the way I'm going to describe here.
Also, please note that I'm generalizing and simplifying here. They teach college level courses in how to do this, and I'm writing a few paragraphs on the internet.
First, you need a lot of examples of the sorts of things you want your system to produce, and you have to tag those examples with that you want the system to note is there. If you want it to produce images, you have to hand it a whole buttload of images, and the image of a penguin standing on a glacier by the ocean has to be tagged with "penguin" "glacier" and "ocean". The picture of the Dragonlance Companions of the Lance with be tagged with "Dwarf" "kender" "half-elf" "knight" "armor" and so on.
This is where the scraping comes in. The internet is loaded with images (and text) that is already tagged, telling you what is in it.
So, you get a huge number of examples - the more the better. Millions or even billions of examples, if you can get them. This is the "training set". One of the primary issues many of us have with Generative AI is that people who build these things take examples for their training sets and don't pay for using them - they pirate all the examples and aim to make a profit off the results. That's unethical - if you are making a profit based in part on someone else's work, they deserve a cut.
Anyway, so you pick up an example (let us say an image) from your training set, and tell the machine to give you an example of something in that image - say, a penguin. The machine doesn't know what the heck a penguin is, so it spits out random drek. You calculate the difference between what it put out, and the example (in the easiest-to-understand approach, you can calculate the differences pixel-by-pixel, but there are more sophisticated ways too). The machine takes that difference, adjust bits on the back end so that when you feed it "penguin" the result is close to the example you showed it.
You do this a bunch of times, with different examples, and eventually it "learns" to produce a credible image of a penguin. Note that it doesn't know what a penguin
is. It just takes the input string "penguin" and crunches numbers and spits out the result. The big thing here is that despite Mamba's assertion, there are people who know what that number crunching looks like. I have a textbook on early forms of that number crunching on my shelf behind me. You can crack open the black box, and see why it responds the way it does.
There's a lot of subtlety, because you aren't just teaching it to respond with something that looks like a penguin. You are, at the same time, teaching it how to respond with something that looks like a poodle. And a puffin. And Paraguay. And pumpkin. And Marcel Marceau. and everything else in your training set.
But, anyway, eventually the machine responds with credible versions of the things you want. In a sophisticated system, there may be some randomization of elements, but the overall effect is more "stimulus & response" than "random generator".