Content about web

The Data That Powers A.I. Is Disappearing Fast (nytimes.com)

Those restrictions are set up through the Robots Exclusion Protocol, a decades-old method for website owners to prevent automated bots from crawling their pages using a file called robots.txt.

Robots.txt for the win.

"Major tech companies already have all of the data," she said. "Changing the license on the data doesn't retroactively revoke that permission, and the primary impact is on later-arriving actors, who are typically either smaller start-ups or researchers."

"...that permission."

AI and tech companies in general have been gaslighting everyone for years now, skipping right past the question of whether the use of publicly available information for training is copyright infringement or not. This is not a settled question, legally, and their continued efforts to portray it as such is almost certainly intentional and orchestrated.

Mr. Longpre said that one of the big takeaways from the study is that we need new tools to give website owners more precise ways to control the use of their data. Some sites might object to A.I. giants using their data to train chatbots for a profit, but might be willing to let a nonprofit or educational institution use the same data, he said. Right now, there's no good way for them to distinguish between those uses, or block one while allowing the other.

Yes, yes, and yes. Let's add more granular control to the Exclusion protocol, somewhere between specific bots (which currently exists) and specific content (which also exists). Something like the ability to exclude bots crawling for a certain purpose (training an AI model v. updating a search index), or bots owned or operated by a certain type of entity (commercial entity v. non-profit, or even big tech v. small shop). Implementing any of these on a technical level would require bot operators to accurately disclose information about their bot, purpose, and entity. Seems like the province of Congress and a bit of a mountain to climb. But, figuring all of this out would certainly empower content creators.

tags: tech web ai

posted by matt in Saturday, July 20, 2024

We had the web and it was glorious. Anyone with some basic technical skills, or the desire to learn them, could buy a domain and start publishing content on that domain in a matter of hours, if not minutes. But they took it from us. Not in the traditional sense, mind you, because technically we still have it. We can still publish content at will. No, their taking is little more devious. They built their silos and made it even easier for people to publish content...in the silos. And they let people do this for no money (not for free, mind you). And then they junkified all the great content with ads and algorithms and infinite scrolling and piss-poor organization that makes it damn near impossible to find your old content and that of others.

It was glorious and I loved it. I used TypePad, Movable Type, Wordpress, and even Drupal. I rolled my own more than a couple times. I've bought more domains than I care to admit. I've written about everything from breakfast musings to pending patent legislation.

I built Daystream to bring the glory back. It's a publishing system without the crap. No ads or algorithms, no infinite scroll, and logical content organization. It's the web as I fell in love with it, and my goal is to help other people discover the glory of what we used to have. And if we get enough people to realize that we still have it, well, maybe it will actually move the needle a bit.

We still have the web, and it is glorious. We just don't realize that we still have it, or that it's glorious.

tags: web dev daystream

postposted by matt in Wednesday, June 19, 2024

Bring back personal blogging (theverge.com)

We are now in an age where people come on the internet to be the worst possible versions of themselves, and it’s an ugly sight to behold. Take the power back by building blogs and putting comment moderation in place.…It’s what the social web was originally about, and we desperately need to get back to that.

This is from December, 2022, but it hits the same today, in January, 2024, as it did then. I doubt personal blogging will ever return to the popularity it enjoyed in the early 2000s, which is what I think the author hopes for. I think it will essentially become the equivalent of ham radio, with a bunch of hobbyists keeping it alive.

And that’s fine with me.

tags: web

posted by matt in Friday, January 12, 2024

The web is yours | James' Coffee Blog (jamesg.blog)

I love the spirit of this post. The web is yours. It’s mine. It’s ours.

We need to reclaim it. That’s a big part of my goal behind Daystream.

tags: web dev

posted by matt in Saturday, January 6, 2024

Scripting News: Wednesday, April 20, 2022 (scripting.com)

"[I]t's time to love RSS again."

I fell in love with RSS when I started my first blog in 2004 (Promote the Progress, one of the three original patent blogs started during January of that year). It fascinated me as a tool for consumption and distribution of content. I've loved it ever since and still use it extensively today. I read this post from Dave Winer (who invented RSS) with my feed aggregator (NetNewsWire). I've also built it into Daystream from the beginning. I'm glad to hear Dave's fixing to shine a light on it again.

tags: web dev rss

posted by matt in Thursday, April 21, 2022

New York and Texas are winning the war to attract bitcoin miners (cnbc.com)

'Within the U.S., 19.9% of bitcoin’s hashrate – that is, the collective computing power of miners – is in New York, 18.7% in Kentucky, 17.3% is in Georgia, and 14% in Texas, according to Foundry USA, which is the biggest mining pool in North America and the fifth-largest globally.'

What a crappy headline. Texas is in fourth place according to the source cited in the article. Why is the fourth place state headline worthy, while the second (Kentucky) and third (Georgia) place states are not? Eyeballs and clicks, Texas apparently brings (or has the potential to bring) more of each.

And this is really sad when you consider that so many people "only read the headlines" these days. They don't walk away with only part of the story, as you would expect of someone only reading the headline. They actually walk away with a completely different story.

That's sad. And dangerous.

And it's not journalism. It's clickbaiting.

tags: journalism web bitcoin

posted by matt in Sunday, October 10, 2021