Wikimedia says the dataset hosted by Kaggle has been “designed with machine studying workflows in thoughts,” making it simpler for AI builders to entry machine-readable article information for modeling, fine-tuning, benchmarking, alignment, and evaluation. The content material throughout the dataset is brazenly licensed, and as of April fifteenth, consists of analysis summaries, brief descriptions, picture hyperlinks, infobox information, and article sections — minus references or non-written components like audio information.
“Because the place the machine studying neighborhood comes for instruments and checks, Kaggle is extraordinarily excited to be the host for the Wikimedia Basis’s information,” mentioned Kaggle partnerships lead Brenda Flynn. “Kaggle is worked up to play a job in protecting this information accessible, obtainable, and helpful.”
