AO3'S Content Scraped For AI ~ AKA What Is Generative AI, Where Did Your Fanfictions Go, And How An AI

AO3'S content scraped for AI ~ AKA what is generative AI, where did your fanfictions go, and how an AI model uses them to answer prompts

Generative artificial intelligence is a cutting-edge technology whose purpose is to (surprise surprise) generate. Answers to questions, usually. And content. Articles, reviews, poems, fanfictions, and more, quickly and with originality.

It's quite interesting to use generative artificial intelligence, but it can also become quite dangerous and very unethical to use it in certain ways, especially if you don't know how it works.

With this post, I'd really like to give you a quick understanding of how these models work and what it means to “train” them.

From now on, whenever I write model, think of ChatGPT, Gemini, Bloom... or your favorite model. That is, the place where you go to generate content.

For simplicity, in this post I will talk about written content. But the same process is used to generate any type of content.

Every time you send a prompt, which is a request sent in natural language (i.e., human language), the model does not understand it.

Whether you type it in the chat or say it out loud, it needs to be translated into something understandable for the model first.

The first process that takes place is therefore tokenization: breaking the prompt down into small tokens. These tokens are small units of text, and they don't necessarily correspond to a full word.

For example, a tokenization might look like this:

Write a story

Each different color corresponds to a token, and these tokens have absolutely no meaning for the model.

The model does not understand them. It does not understand WR, it does not understand ITE, and it certainly does not understand the meaning of the word WRITE.

In fact, these tokens are immediately associated with numerical values, and each of these colored tokens actually corresponds to a series of numbers.

Write a story 12-3446-2638494-4749

Once your prompt has been tokenized in its entirety, that tokenization is used as a conceptual map to navigate within a vector database.

NOW PAY ATTENTION: A vector database is like a cube. A cubic box.

AO3'S Content Scraped For AI ~ AKA What Is Generative AI, Where Did Your Fanfictions Go, And How An AI

Inside this cube, the various tokens exist as floating pieces, as if gravity did not exist. The distance between one token and another within this database is measured by arrows called, indeed, vectors.

AO3'S Content Scraped For AI ~ AKA What Is Generative AI, Where Did Your Fanfictions Go, And How An AI

The distance between one token and another -that is, the length of this arrow- determines how likely (or unlikely) it is that those two tokens will occur consecutively in a piece of natural language discourse.

For example, suppose your prompt is this:

It happens once in a blue

Within this well-constructed vector database, let's assume that the token corresponding to ONCE (let's pretend it is associated with the number 467) is located here:

AO3'S Content Scraped For AI ~ AKA What Is Generative AI, Where Did Your Fanfictions Go, And How An AI

The token corresponding to IN is located here:

AO3'S Content Scraped For AI ~ AKA What Is Generative AI, Where Did Your Fanfictions Go, And How An AI

...more or less, because it is very likely that these two tokens in a natural language such as human speech in English will occur consecutively.

So it is very likely that somewhere in the vector database cube —in this yellow corner— are tokens corresponding to IT, HAPPENS, ONCE, IN, A, BLUE... and right next to them, there will be MOON.

AO3'S Content Scraped For AI ~ AKA What Is Generative AI, Where Did Your Fanfictions Go, And How An AI

Elsewhere, in a much more distant part of the vector database, is the token for CAR. Because it is very unlikely that someone would say It happens once in a blue car.

AO3'S Content Scraped For AI ~ AKA What Is Generative AI, Where Did Your Fanfictions Go, And How An AI

To generate the response to your prompt, the model makes a probabilistic calculation, seeing how close the tokens are and which token would be most likely to come next in human language (in this specific case, English.)

When probability is involved, there is always an element of randomness, of course, which means that the answers will not always be the same.

The response is thus generated token by token, following this path of probability arrows, optimizing the distance within the vector database.

AO3'S Content Scraped For AI ~ AKA What Is Generative AI, Where Did Your Fanfictions Go, And How An AI

There is no intent, only a more or less probable path.

The more times you generate a response, the more paths you encounter. If you could do this an infinite number of times, at least once the model would respond: "It happens once in a blue car!"

So it all depends on what's inside the cube, how it was built, and how much distance was put between one token and another.

Modern artificial intelligence draws from vast databases, which are normally filled with all the knowledge that humans have poured into the internet.

Not only that: the larger the vector database, the lower the chance of error. If I used only a single book as a database, the idiom "It happens once in a blue moon" might not appear, and therefore not be recognized.

But if the cube contained all the books ever written by humanity, everything would change, because the idiom would appear many more times, and it would be very likely for those tokens to occur close together.

Huggingface has done this.

It took a relatively empty cube (let's say filled with common language, and likely many idioms, dictionaries, poetry...) and poured all of the AO3 fanfictions it could reach into it.

Now imagine someone asking a model based on Huggingface’s cube to write a story.

To simplify: if they ask for humor, we’ll end up in the area where funny jokes or humor tags are most likely. If they ask for romance, we’ll end up where the word kiss is most frequent.

And if we’re super lucky, the model might follow a path that brings it to some amazing line a particular author wrote, and it will echo it back word for word.

(Remember the infinite monkeys typing? One of them eventually writes all of Shakespeare, purely by chance!)

Once you know this, you’ll understand why AI can never truly generate content on the level of a human who chooses their words.

You’ll understand why it rarely uses specific words, why it stays vague, and why it leans on the most common metaphors and scenes. And you'll understand why the more content you generate, the more it seems to "learn."

It doesn't learn. It moves around tokens based on what you ask, how you ask it, and how it tokenizes your prompt.

Know that I despise generative AI when it's used for creativity. I despise that they stole something from a fandom, something that works just like a gift culture, to make money off of it.

But there is only one way we can fight back: by not using it to generate creative stuff.

You can resist by refusing the model's casual output, by using only and exclusively your intent, your personal choice of words, knowing that you and only you decided them.

No randomness involved.

Let me leave you with one last thought.

Imagine a person coming for advice, who has no idea that behind a language model there is just a huge cube of floating tokens predicting the next likely word.

Imagine someone fragile (emotionally, spiritually...) who begins to believe that the model is sentient. Who has a growing feeling that this model understands, comprehends, when in reality it approaches and reorganizes its way around tokens in a cube based on what it is told.

A fragile person begins to empathize, to feel connected to the model.

They ask important questions. They base their relationships, their life, everything, on conversations generated by a model that merely rearranges tokens based on probability.

And for people who don't know how it works, and because natural language usually does have feeling, the illusion that the model feels is very strong.

There’s an even greater danger: with enough random generations (and oh, the humanity whole generates much), the model takes an unlikely path once in a while. It ends up at the other end of the cube, it hallucinates.

Errors and inaccuracies caused by language models are called hallucinations precisely because they are presented as if they were facts, with the same conviction.

People who have become so emotionally attached to these conversations, seeing the language model as a guru, a deity, a psychologist, will do what the language model tells them to do or follow its advice.

Someone might follow a hallucinated piece of advice.

Obviously, models are developed with safeguards; fences the model can't jump over. They won't tell you certain things, they won't tell you to do terrible things.

Yet, there are people basing major life decisions on conversations generated purely by probability.

Generated by putting tokens together, on a probabilistic basis.

Think about it.

More Posts from Reviiely and Others

1 year ago

More Than Words

finished

archiveofourown.org
An Archive of Our Own, a project of the Organization for Transformative Works
More Than Words
Wattpad
After over two years of dating, Daisy is a little excited to move on to the next step. HYDRA steps in as the ultimate j...

Tags
1 year ago

PHILINDAISY TIME

archiveofourown.org
An Archive of Our Own, a project of the Organization for Transformative Works

Super slow burn because neither of them know about it until the second half.


Tags
6 months ago

If I never post anything again it’s because my back snap crackle popped too hard and I died

They call me a campfire the way my back snap crackle and pop when I bend down


Tags
1 month ago

Sawamura Daichi you look like a squirrel

3 months ago
HELLO I BRING JOY AND HALLMARKS

HELLO I BRING JOY AND HALLMARKS

Ahem. Thank you to @kings-highway for the lovely inspiration from this post. Also, first art with actual background!! Achievement unlocked!!!

Lil treats under the cut!

HELLO I BRING JOY AND HALLMARKS
HELLO I BRING JOY AND HALLMARKS

Enjoyyyyyyyyyy


Tags
11 months ago

@sunny-poe you too I forgot I had you on tumblr

archiveofourown.org
An Archive of Our Own, a project of the Organization for Transformative Works

GUESS WHO'S BACK


Tags
3 years ago

Resources for Writing Injuries

image

Patreon || Ko-Fi || Masterlist || Work In Progress

Head Injuries

General Information | More

Hematoma

Hemorrhage

Concussion

Edema

Skull Fracture

Diffuse Axonal Injury

Neck

General Information

Neck sprain

Herniated Disk

Pinched Nerve

Cervical Fracture

Broken Neck

Chest (Thoracic)

General Information

Aortic disruption

Blunt cardiac injury

Cardiac tamponade

Flail chest

Hemothorax

Pneumothorax (traumatic pneumothorax, open pneumothorax, and tension pneumothorax)

Pulmonary contusion

Broken Ribs

Broken Collarbone

Keep reading


Tags
1 year ago

Begging for a young Phil and May Academy fic like I’m not currently writing one

Clark Gregg In Law & Order's "Life Choices" (1991)
Clark Gregg In Law & Order's "Life Choices" (1991)
Clark Gregg In Law & Order's "Life Choices" (1991)
Clark Gregg In Law & Order's "Life Choices" (1991)
Clark Gregg In Law & Order's "Life Choices" (1991)
Clark Gregg In Law & Order's "Life Choices" (1991)
Clark Gregg In Law & Order's "Life Choices" (1991)
Clark Gregg In Law & Order's "Life Choices" (1991)

Clark Gregg in Law & Order's "Life Choices" (1991)


Tags
  • limeborn
    limeborn reblogged this · 4 weeks ago
  • taya-fox
    taya-fox liked this · 4 weeks ago
  • dooberryforest
    dooberryforest liked this · 1 month ago
  • tragedylure
    tragedylure reblogged this · 1 month ago
  • tragedylure
    tragedylure liked this · 1 month ago
  • skittering-dreams
    skittering-dreams liked this · 1 month ago
  • irrelevant-to-the-point
    irrelevant-to-the-point liked this · 1 month ago
  • slightlyunofficial
    slightlyunofficial liked this · 1 month ago
  • lilystarflame
    lilystarflame liked this · 1 month ago
  • psychopathic-moves
    psychopathic-moves liked this · 1 month ago
  • heyitscoffeetime
    heyitscoffeetime reblogged this · 1 month ago
  • heyitscoffeetime
    heyitscoffeetime liked this · 1 month ago
  • memefreckles
    memefreckles liked this · 1 month ago
  • heavenlessly
    heavenlessly liked this · 1 month ago
  • rubbersoles19
    rubbersoles19 liked this · 1 month ago
  • monstersqueen
    monstersqueen liked this · 1 month ago
  • heartofrosesandgold
    heartofrosesandgold liked this · 1 month ago
  • possibly-maybe-okayish
    possibly-maybe-okayish liked this · 1 month ago
  • bell-oh
    bell-oh liked this · 1 month ago
  • chacetic
    chacetic liked this · 1 month ago
  • cherryblossom88838
    cherryblossom88838 liked this · 1 month ago
  • addanotheronnee
    addanotheronnee reblogged this · 1 month ago
  • jiosoull
    jiosoull liked this · 1 month ago
  • daughter-of-winterfell
    daughter-of-winterfell liked this · 1 month ago
  • humongousarbiterwolf
    humongousarbiterwolf reblogged this · 1 month ago
  • humongousarbiterwolf
    humongousarbiterwolf liked this · 1 month ago
  • aibhilin-atibeka
    aibhilin-atibeka reblogged this · 1 month ago
  • aibhilin-atibeka
    aibhilin-atibeka liked this · 1 month ago
  • herqueenstrawberry
    herqueenstrawberry liked this · 1 month ago
  • aprilpoison
    aprilpoison liked this · 1 month ago
  • anonym1st
    anonym1st liked this · 1 month ago
  • parsley-the-crow
    parsley-the-crow reblogged this · 1 month ago
  • parsley-the-crow
    parsley-the-crow liked this · 1 month ago
  • spiderchao
    spiderchao reblogged this · 1 month ago
  • theultimatedisaster
    theultimatedisaster liked this · 1 month ago
  • color-craz
    color-craz reblogged this · 1 month ago
  • wolfiestarzz
    wolfiestarzz reblogged this · 1 month ago
  • shakespeareanjorts
    shakespeareanjorts reblogged this · 1 month ago
  • shakespeareanjorts
    shakespeareanjorts liked this · 1 month ago
  • sweetmascherari
    sweetmascherari liked this · 1 month ago
  • wingsofopal
    wingsofopal reblogged this · 1 month ago
  • wingsofopal
    wingsofopal liked this · 1 month ago
  • whimsquirksandstuff
    whimsquirksandstuff liked this · 1 month ago
  • harlotofupdog
    harlotofupdog liked this · 1 month ago
  • weesheshy
    weesheshy reblogged this · 1 month ago
  • meme-void
    meme-void liked this · 1 month ago
  • nikibogwater
    nikibogwater reblogged this · 1 month ago
  • gilmorenights
    gilmorenights liked this · 1 month ago
reviiely - VIIE
VIIE

main blog for @aishi-t and @cuttycrumbingPrompts for @tendousatori-week are now up!

215 posts

Explore Tumblr Blog
Search Through Tumblr Tags