Full text to avoid paywall
If you’ve left a comment on a YouTube video, a new website claims it might be able to find every comment you’ve ever left on any video you’ve ever watched. Then an AI can build a profile of the commenter and guess where you live, what languages you speak, and what your politics might be.
The service is called YouTube-Tools and is just the latest in a suite of web-based tools that started life as a site to investigate League of Legends usernames. Now it uses a modified large language model created by the company Mistral to generate a background report on YouTube commenters based on their conversations. Its developer claims it’s meant to be used by the cops, but anyone can sign up. It costs about $20 a month to use and all you need to get started is a credit card and an email address.
The tool presents a significant privacy risk, and shows that people may not be as anonymous in the YouTube comments sections as they may think. The site’s report is ready in seconds and provides enough data for an AI to flag identifying details about a commenter. The tool could be a boon for harassers attempting to build profiles of their targets, and 404 Media has seen evidence that harassment-focused communities have used the developers’ other tools.
YouTube-Tools also appears to be a violation of YouTube’s privacy policies, and raises questions about what YouTube is doing to stop the scraping and repurposing of peoples’ data like this. “Public search engines may scrape data only in accordance with YouTube’s robots.txt file or with YouTube’s prior written permission,” it says.
To test the service, I plugged a random YouTube commenter into the system and within seconds the site found dozens of comments on multiple videos and produced an AI-generated paragraph about them. “Possible Location/Region: The presence of Italian language comments and references to ‘X Factor Italia’ and Italian cooking suggest an association with Italy,” the report said.
“Political/Social/Cultural Views: Some comments reflect a level of criticism towards interviewers and societal norms (e.g., comments on masculinity), indicating an engagement with contemporary cultural discussions. However, there is no overtly political stance expressed,” it continued.
According to the site, it has access to “1.4 billion users & 20 billion comments.” The dataset is not complete; YouTube has more than 2.5 billion users.
Youtube-Tools launched about a week ago and is an outgrowth of LoL-Archiver. There’s also nHentai-Archiver, which can give you a comprehensive comment history of a user on the popular adult manga sharing site. Kick-Tools can produce the chat history or ban history of a user on the streaming site Kick. Twitch-Tools can give you the chat history for an account sorted by timestamp and sortable by all the channels they interact on.
Twitch-Tools only monitors a channel that users have specifically requested it to monitor. As of this writing, the website says it is monitoring 39,057 Twitch channels. For example, I was able to pull a username from a popular Twitch stream, plug it into the tool and then track every time that user had made a comment on another one of the tracked channels.
Reached for comment, the developer of these tools didn’t dance around the reason they built them. “The end goal of people tracking Twitch channels would certainly be to gather information on specific users,” they said.
Twitch did not respond to 404 Media’s request for comment, and YouTube acknowledged a request but did not provide a statement in time for publication. But I spoke with someone in control of a contact email address listed on the LoL-Archiver’s “about” page. They said they’re based in Europe, have a background in OSINT, and often partnered with law enforcement in their country. “I decided I launched [sic] these tools in the first place as a project to build the tool that could be use by LEAs [law enforcement agencies] and PIs [private investigators.]”
According to the developer, they’ve provided the tool to cops in Portugal, Belgium, and “other countries in Europe.” They told 404 Media that the website is meant for private investigators, journalists, and cops.
“To prevent abuses [sic] we only allow the website to people with legitimate purposes,” they said. I asked how the site vets users. “We ask the users to accept our Terms of Use and do targeted KYC [know your customer] requests to people we estimate have an illegitimate reason to use our website. If we find that a user doesn’t have a legitimate purpose to use our service according to our terms of use, we reserve the right to terminate that user’s access to our website.”
The site’s Terms of Service makes this explicit in the first paragraph. “The Service is distributed only to licensed professional investigators and law enforcement. Non-professional individuals are not allowed to subscribe to the Service,” it says.
But YouTube-Tools is a “grant access first ask for proof later” kind of website. 404 Media was able to set up an account and begin browsing information in minutes after paying for a month of the service with a credit card. It didn’t ask me any questions about how I planned to use the service nor did it need any other information about me.
I asked the developer for an example of a time they had removed someone from the platform. They said they’d removed a client a few weeks ago after they realized the email the client used to obtain their license was “temporary.” The developer said they reached out to the client to ask why they wanted the tool and didn’t get a response. “They ignored us, and we therefore reported the issue to Stripe and terminated their access.”
The AI summaries are new and only exist for the YouTube tools. “The AI summary is to provide points of interest, so that an investigator doesn’t have to go through the (potentially) thousand [sic] of comments,” the developer said. “This summary is not to replace the research and investigation process of the investigator, but to give clues on where they can start looking at first.”
I asked them about the possible privacy violations the tool presents and the developer acknowledged that they’re real. “But we try to limit them during [our] vetting process,” they said. Again, I was able to sign up for the site with a credit card and an email. I was not vetted.
“I also believe that the tool can be a very valuable source of information for professionals such as police agencies, private investigators, journalists,” the developer said. “That is why we currently offer free access to police agencies requesting it, and have offered [it] to several agencies already. If someone wants to remove any information that the tools has archived they can make a formal request to us, to which we will comply, as we’ve always done.”
Scraping public data is a big problem. Last month, researchers in Brazil published a dataset built from 2 billion Discord messages they’d pulled from publicly available servers. Last year, Discord shut down a service called Spy Pet that’s similar to YouTube-Tools.
The developer claims the tool is for cops, but anyone can sign up and use it for targeted harassment.
Even if it was only for cops, fuck this dev
Honestly? Especially if it was only for cops
Yea. Dude is a fucking red coat. Absolutely PoS human.
I call bullshit on it being for cops: given valid-ish reasons, they can simply request all the comments left by the user directly from google and ask an llm of their choosing to produce a similar result.
Developers like this should be considered collaborators with fascist elements.
Why? They are just bringing to light the tools already being used by corps behind closed doors.
Edit: Seems the author wants to paint a different picture. Either extreme CYA or you were correct.
Yup yup. If someone can do this as a solo dev, then you bet your ass the big corpos are already doing it on every bit of user data they can get their hands on.
Great! I hope they come find me! I’m doing all sorts of shady illegal things. I’m even a convict, but I escaped jail time. I’ll continue doing unlawful things, and continue putting all of your lives in danger. Especially the minorities, women and immigrants! My address is 1600 Pennsylvania Avenue NW, Washington, DC 20500.
Username checks out
deleted by creator
“I decided I launched [sic] these tools in the first place as a project to build the tool that could be use by LEAs [law enforcement agencies] and PIs [private investigators.]”
According to the developer, they’ve provided the tool to cops in Portugal, Belgium, and “other countries in Europe.” They told 404 Media that the website is meant for private investigators, journalists, and cops.
It sounds like they’re actively peddling it to cops.
Anyone who builds tools explicitly for mass surveillance of the public is a collaborator, no matter who writes their paycheck.
And don’t let the narrative be “MegaCorp built these tools.” No, human beings with names and ostensibly consciences built these tools for MegaCorp. They are just as guilty if not more so.
I like to throw out a random y’all just to throw off scrapers like this. In my mind, they’ll either think I’m in the southern US or just get confused.
deleted by creator
Oyyy krikey!
I sense some Kentucky in this one.
Absolute smashing idea mate, totally chuffed. Bob’s your uncle!
Oi! I sometimes muck up the details about me life. Am I a lad or lady? Gay or straight? Is all good guvna!
Zees ahr thangs one may nevaihr know, n’est-ce pas?
*adjusts beret*
是的当然 非美国人同胞
Yep, basic opsec (which I’m totally not following on this account). Mix up your slang and phrasing, fake personal details, and rotate accounts to avoid any singilar one building up too much info on it.
Brilliant! I should start writing and speaking like a Brit.
You should go on fetlife. Whole bunch of kinky motherfuckers in the science field doing research in Antartica.
to Predict Where Users Live
sign up
no proper examples
made for cops, journalists and PI’s
Translation: Unless you’ve revealed a bunch of personal information, it wont “predict” where you live.
As someone who only speaks English as a third language after Latin and Mandarin living in Anchorage I tend to agree with you.
Except you don’t need personal information, look at supermarket data, even when ‘anonymized’ it can predict pregnancy and even a rough geographic location based on what items are available at various locations/times as well as corelating your purchases with the weather.
Yeah, but this is just the beginning. Identifying subtle typing patterns will be much more effective at getting your location. At this point, one of the only ways to fight back against that is to: 1: Write what you want to 2: Feed that into a local LLM, and tell it to use a ChatGPT-like writing style 3: Copy that text, and post it
YouTube-Tools also appears to be a violation of YouTube’s privacy policies, and raises questions about what YouTube is doing to stop the scraping and repurposing of peoples’ data like this. “Public search engines may scrape data only in accordance with YouTube’s robots.txt file or with YouTube’s prior written permission,” it says.
Yeah right Google, scrape people at will, scrape artists copyrighted material without approbations or repercussions, and you dare to tell us to obey your little concerns…
Fuck off will ya
@abobla@lemm.ee Just out of curiosity I chucked some od your lemmy comments into a prompt on chatGPT. It sommerised ‘you’ in a few seconds - much faster than I could be bothered to do otherwise and came to this conclusion:
abobla seems to be a tech-savvy user with a strong interest in Linux, open-source software, and internet culture. They enjoy informal, humorous conversation but also value constructive technical discussion. They’re engaged in the community, appreciative of good tools and helpful comments, and mindful of their social presence.
The most scary part is this gem at the end:
I mean, it’s a pretty basic description. Some points are kinda obvious: “they are appreciative of helpful comments”, who isn’t?
“Man, I sure love comments that make my life worse”, imagine someone saying that, wtf.
Nevertheless, very interesting stuff, thanks.
deleted by creator
This is exactly what I did on Reddit.
I live in Chicago. It’s not a secret. I brag about it. Also I’m way too old to mask my decades long digital footprint.
To this point. As soon as you use technology you have no privacy and if you think you do you’re an idiot. There is no such thing as privacy in a digital world. There never was.
As soon as you understand that you can shift your focus to what you want of yourself out there if you don’t want it known don’t say it just like if I was in a room full of people. As soon as technology is involved that’s the room full of people.
I am amazed that people struggle with this or refuse to acknowledge it
Fair dinkum? Strewth mate, I’d be gutted if I lost mi onloine privacy.
This is one of the main reason i adopted a shizo personality for this persona.
This is something an LLM can do really well, really easily, with little engineering effort.
The cat is out of the bag. Most of us have a bunch of unstructured comment data that is peppered with references to local weather, policies, sports, etc. Determining what city or what district you live in is, sadly, not hard in this day and age.
This only requires a quick integration and some promo engineering. Assume people are already doing it.
This is not a good use for this tech. I mean, it might work well, but it’s not a good use (IE, it’s an evil use). Not the actual finding, since it’s all public comments anyway, but the providing of the tool to authoritarian bodies.
I predicted this years and years ago when the internet was still young. I’m probably already well scanned and filed away like the rest of us, but as a rule of thumb I always come up with new usernames for everything just to be safe. I’m never the same on more than two platforms max. I’ve never connected my usernames to my identity, I’ve just always been paranoid about it since dial up was a thing.
Plus, I like the change. You get infinite new starts and it really gives you time to think and build on your first impressions. I highly recommend it. Not even my MMO accounts use the same character names from any other MMO. I have many abandoned emails, and never name them anything similar.
If you’re going to be online, always keep moving. Keep changing, keep remaking yourself, and do it often. The same goes for avatars. Change them often and try not to use too many from actual hobbies you like. AI slop is out there, just generate something locally or pick something random that looks cool.
It’s not impossible to still fingerprint someone like this, but that doesn’t mean make it easy for big corp to file you away and monetize your existence.
I have many abandoned emails
Never abandon your emails. Scammers can reuse your address.
I keep them locked and change the passwords once in a while. Basically, once my address starts getting a ton of spam and scams per day, I stop using that address. Only the one I’m currently using shows notifications. The rest only show when there’s a login.
I don’t change too often. I still only have a handful.
Cookies, gpu Rendering, Mac Adresse, ip, User Agent and behavior are all identifiers which can tie all of your accounts together.
-“They got him because he always use the same password”.
You’re identifying yourself with a whole bunch of stuff like configs and more. Back in the day usolating multi users (in games) used screen resolution for example.
Yeah, screen resolution is still used sometimes, I think. Fonts is a big one. I believe there’s software and extensions that makes it look like you have a bunch of random fonts, but I haven’t looked too hard into it.
Just never have money. You’ll almost never interest anyone. I’m broke af, so I’m boring and worthless af.
Pretty fucking awful
Big whoop. I remember seeing free tools that would look up a username on Reddit and tell you about them. This “tech” has been around for a while.
Reddit (as well as lemmy) is a bit simpler in that regard: all you need to find all the posts made by the $username is to visit their profile, while YouTube actually requires scraping.
deleted by creator
I was with ya until that last bit
Seeing how this article exists, you’re already too late.