What’s the Deal with the Data?

Legal Disclaimer: I’m not an attorney, so nothing in this post should be construed as legal advice. If you believe your intellectual property rights have been infringed upon, please contact your agent, your publisher, and/or your (or their) legal counsel. The Author’s Guild is also available for many traditionally published (or agented but not yet published) authors.

Work Disclaimer: These opinions are my own and are not intended to represent the views or opinions of my employer.


Yes, I work in software compliance. Not in the arts, but in healthcare. Still, the ethical constructs remain the same. My job is to guard data with, to be frank, utter ferocity. I take this job extremely seriously. Because at the end of the day, data isn’t simply data.

It’s also my job to remind people of that.

Data is protected health information. Emphasis on protected. It’s that diagnosis you’ve been hiding from your family because you don’t want to worry them. It’s the abortion you had when you were young held between you and your doctor. It’s your birth sex that no one has a right to know except you and your OBGYN. It’s your weight. Your mental health history. Your struggles and triumphs. It’s precious. Sacred.

Data is personal information. It’s the social security number you fought to earn over a course of years as you worked toward citizenship. It’s the driver’s license you won back after you won your sobriety. It’s the credit score you battled to on your way out of poverty. It’s the zip code you’re hiding from your abusive ex.

Data is dreams. It’s decades of callused fingers holding a pick to strings. Frustration as you mixed and remixed colors trying to capture the exact shade of pink in that sunset over your grandmother’s funeral. It’s the cool grass beneath your head, a book in your hands, reading about a female knight for the first time, realizing you might be able to write stories like this too if you tried. It’s burning, bloodshot eyes staring at draft after draft after draft. It’s bitten down pens and pencils and charcoal stained hands. It’s college tuition money you’ll never see back in your lifetime, and arguments with parents that echo in your ears as you chase a dream so far out of reach but worth chasing all the same. It’s years of jobs you hate, trudging home exhausted, trying to find time for the only thing that quenches the ache in your soul. It’s a first commission. A demo tape slipped into the right hands at the right time. An advance split into four payments that dwindle to nearly nothing but not quite. The not quite is important because it’s something after years of nothing, nothing, nothing.

Image of three monitors behind which are binary code with a city skyline. 
Image sourced via Pixabay.

Yes. I take data seriously. Data ethics, too.

So imagine my surprise and dismay when I signed off my work computer after giving an hour and a half long presentation on ethical AI and secure coding to my compliance and data security teams only to find ethics had once again been breached in relation to my dream: publishing.

For those who don’t know, yesterday, a company incorporated in Oregon doing business under the name Shaxpir, went viral in the Twitter writing community after it was revealed a project called Prosecraft (operating under the Shaxpir name), had collected thousands of books to be fed into its algorithms without authors’ or publishers’ knowledge or consent. At this time, it’s unknown how many books or authors were affected but Prosecraft boasted of having a database of more than 25,000 books. Authors like Angie Thomas, Victoria Aveyard, and Kate Elliott addressed the issue head on, stating consent was not given for their books to be listed there (yet there they were). Dozens of other authors confirmed the same. Some of them friends. Debuts. People I know who have clawed their way through impossible odds to arrive to… this.

Shaxpir founder Benjamin Smith took the Prosecraft website down after public cries of outrage and issued an… apology looking thing, but made no mention of the data. Not where he got it, or if he was keeping it, or if it was indeed fueling Shaxpir, the software as a service business model billed at $7.99 a month. According to the Shaxpir site, though, the Prosecraft data is indeed part of the paid model.

Screenshot from https://shaxpir.com/pricing showing Prosecraft: Linguistics for Literature as a paid feature.
Note that Shaxpir also boasts a “Concept Art” feature. Just pointing that out.

Taking the website down but not deleting the data is a big deal. It means there are authors out there who don’t know if their books were part of this because they didn’t get a chance to search the website before it was shut down. I myself was in the middle of searching the website for friends’ books (and actually my own once upon a time self-published books) when it was taken down. I found one final friend’s book before it went dark. I never got to check on my own.

There’s a larger picture, here, however. It’s one I talk about frequently in my day job and one that hovers at the front of my mind almost constantly. It goes beyond one man running a two-person startup, and shit apologies, and the fury burning through my veins when I see my friends’ books blatantly stolen as robot food.

It’s a picture about the larger picture.

Technology isn’t inherently evil. We can very easily make it so, though. Because we make it in our image. And when we make technology without considering the global picture, we recreate ourselves, only worse. The decisions come faster, are often unexplainable and undetectable (even to their makers), and in being so, are often indefensible. This is sometimes called the “black box” problem. To avoid it, AI and algorithms (deep learning in particular, which to be clear, Shaxpir does not appear to have been using) have to be created with purpose, transparency, ethics, and a global framework in mind. They cannot be created simply because wow, wouldn’t that be cool?

Wouldn’t that be cool? The question that started a thousand dystopian novels. (That some techbro went and data scraped for their LLM lolz.)

Image of a figure in a suit with a respirator amidst a destroyed city on fire. 
Image via Pixabay
This was titled “AI This was titled “AI generated dystopia.” Whether that means it was generated by AI or the dystopia was generated by AI is unclear. Perhaps both. Both seems appropriate. I try to avoid AI generated art in these posts (where I can, I’m no artist and sometimes can’t tell) because many of them are using LLM to create their generative art which is stealing art the same way LLM text features are stealing books but the transparency with the title gave me enough pause to be equally transparent with the use case here. Because transparency is what I’m about to talk about.

The fact is AI isn’t going away. It’s been here awhile and it will continue to be. It’s doing amazing things in a lot of places. It’s also hurting people. Whole industries, actually. Like the people who act as its gods, it creates and destroys in their image. It can be biased and prejudiced and innovative and beautiful and ethical and transparent and honest. It can learn and develop and change and evolve. It can become worse. Or better. It depends on the guide.

Software developers are creators, too. We just speak a different sort of language. Are they all going to listen? I’d be naive if I thought so. But if enough of them do, we’ll be a hell of a lot better off.

So that’s my goal every day. To help people in tech understand these aren’t just points of data fed to a machine. To encourage them to slow down for five minutes so they might better understand the base of that LLM learned to speak human by STEALING from a human. From someone like you. Like me. From someone who had a dream.

A dream just like theirs, really.

Am I angry? Yes. Today more than other days. Honestly, I started this blog to be a seething commentary about Shakspir and AI and all the shit tech keeps stealing from us. But as I wrote, I realized I only feel sad. And tired. Maybe a little scared, too. I’ve fought so hard for my dream and I’m not ready to give it up.

From that springs hope. Hope for ethical, responsible AI. Hope that we can find common ground. Hope that we’ll be able to understand one another if tech can slow down and maybe we can all sit down and work this out together.

Before we destroy all the data. Or all the dreams it holds.

Image of two faces staring at one another behind a binary code of data.
Image via Pixabay.