This statement comes in the wake of revelations about the AI research lab EleutherAI, which had controversially harvested subtitles from YouTube videos without creators’ consent, along with data from Wikipedia, the English Parliament, and Enron staff emails. This collected information was compiled into a dataset known as “the Pile.”
EleutherAI aimed to democratize AI development by lowering the entry barriers for those outside the Big Tech sphere. Despite these noble intentions, the methods used to collect data have sparked ethical concerns. Major companies, including Nvidia, Salesforce, and Apple, have utilized the Pile for various AI training projects.
Apple’s Position on the Use of the Pile
Apple has clarified that while it did use the Pile, this dataset was not employed in training Apple Intelligence. Instead, Apple used the Pile to train its open-source OpenELM models, which were released in April. AppleInsider has confirmed that OpenELM models are not utilized in any of Apple’s AI or machine learning features. The purpose of OpenELM was to contribute to the research community, and Apple has stated there are no plans to integrate OpenELM into Apple Intelligence or develop new versions of the model.
Apple has consistently emphasized its commitment to ethical sourcing for its AI projects. The company has invested millions in securing content from publishers and has licensed images from professional photo libraries to ensure the ethical training of its AI systems. This approach underscores Apple’s dedication to maintaining high ethical standards in its technological advancements.
Apple’s response to the controversy surrounding EleutherAI’s data harvesting reinforces its stance on ethical AI development. By distinguishing the use of the Pile for OpenELM from Apple Intelligence, Apple aims to reassure users and stakeholders of its commitment to integrity and ethical practices in AI research and development.