OpenAI COO refused to answer if Sora is trained on YouTube videos

Screenshot 2024 05 10 161245

OpenAI’s text-to-video model Sora is both impressive and scary depending on the perspective. It may help you turn your ideas into a video clip without even a camera. But it can also take those jobs. For those unfamiliar, the artificial intelligence model generates up to minute-long video clips based on simple text descriptions.

The Microsoft-backed startup has just released the first major music video video generated by Sora. Importantly, the model is still in the testing phase and is not available to the public yet. So, there’s still room for improvement across different aspects.

Did OpenAI train Sora on YouTube videos?

Last month, YouTube’s CEO has already warned OpenAI against using its videos to train Sora. In a more recent interview interview at the Bloomberg Technology Summit, the COO Brad Lightcap spoke on potential business applications of their AI products. Sora, one of the notable products of the startup that has potential business applications, also appeared in the conversation. Speaking of Sora the interviewer raised the question – “What training data was used to train the model?”

More specifically, the interviewer pressed the OpenAI official to definitively clarify whether they trained Sora on YouTube videos. However, Lightcap seemed reluctant to provide a direct answer. Instead, he discussed various aspects, including content generation, utilizing this content as data for model training, ensuring transparency regarding data usage, potential benefits for content creators, and more. However, he didn’t mention YouTube for a single time in his descriptive “non-answer” to whether OpenAI has trained Sora on videos from the platform or not.

COO Brad Lightcap refused to answer the question

“So, yeah, we’re looking at this problem, it’s really hard. We don’t have all the answers yet,” he ended with. OpenAI did indeed share some information on “understanding the source of what we see and hear online.” However, it was mainly about content authenticity and how they are planning to maintain transparency about the source of content. However, the post didn’t really talk about what data they have used or are using to train the language models. Not to mention it didn’t talk about the use of content from YouTube either.

To recall, the company’s CTO Mira Murati was also asked the same question about Sora earlier last month. She also couldn’t give a clear answer to the question.

According to reports from earlier this year, OpenAI used YouTube videos to train GPT-4, which is against platform rules. However Google also reportedly did the same. Speaking of Sora’s training data, the complicated non-answer hints at the possibility of the use of YouTube videos. The model could be released to the public in the second half of 2024, possibly in August.

The post OpenAI COO refused to answer if Sora is trained on YouTube videos appeared first on Android Headlines.