UPSC GS (Pre & Mains) Telegram Channel Join Now
UPSC History Optional Telegram Channel Join Now
5/5 - (2 votes)

AI needs cultural policies, not just regulation

Only by giving fair and wide access to data can AI’s full potential be realised and its benefits distributed equitably.

(i) Introduction

Artificial Intelligence (AI) holds tremendous potential for transforming societies, but ensuring that this transformation benefits everyone will require more than just regulation. To foster safe, trustworthy, and equitable AI, it is crucial to pair regulation with cultural policies that promote access to high-quality data as a public good. Such policies will support transparency, level the playing field, and build public trust in AI systems. By making data widely and fairly accessible, we can unlock AI’s full potential and ensure that its benefits are distributed equitably across all sectors of society.

(ii) The Vital Role of Data in AI

Data forms the foundation of AI development. The performance of AI models, particularly large language models (LLMs), improves with access to more and diverse human-generated data. Alongside advances in computing power and algorithms, data is one of the main drivers of progress in AI. The larger and more varied the dataset, the better these models become at understanding and generating human-like text.

The laws of neural scaling reflect this trend: more data leads to better model performance. For instance, Meta’s LLama 3 model was trained on an enormous dataset of 15 trillion tokens, over ten times the size of the British Library’s entire book collection. As AI technologies advance, the demand for such vast datasets continues to grow.

(iii) The Race for Data and Ethical Concerns

However, there is a problem: the human-generated content available online may soon be insufficient to meet AI’s increasing appetite for data. According to some experts, the world may reach a point of “peak data” by 2030, where the supply of pristine text for training AI models will no longer meet demand. Another concern is the risk of public data contamination, where AI-generated content feeds back into models, amplifying biases and reducing the diversity of data. These issues highlight the ethical risks of the current AI data race, which is often pursued without regard for quality or ethics.

One prominent example of this is the “Books3” dataset, a collection of pirated books used by some AI companies to train their models. The legality of this practice is up for debate, but the larger issue is the lack of ethical standards in data collection. AI models are being trained on a mix of licensed, public, and pirated content without any clear guiding principles. Studies show that much of this data reflects existing biases in cyberspace, reinforcing an overwhelmingly English-centric and present-focused worldview.

(iv) The Absence of Primary Cultural Sources

A common misconception is that LLMs are trained on a comprehensive collection of human knowledge. In reality, these models fall far short of being the universal library imagined by philosophers like Leibniz or writers like Borges. While datasets like “Books3” may include scholarly works, they largely consist of secondary sources written in English—commentaries that provide only a surface-level understanding of human culture.

Crucially missing are the primary sources that capture the depth and diversity of human history and culture: archival documents, oral traditions, ancient manuscripts, and inscriptions in lesser-known languages. These materials are the raw data of cultural heritage, yet they remain largely untapped by AI researchers.

For example, consider the State Archives of Italy, which house over 1,500 kilometers of documents. This vast repository, alongside others around the world, represents an immense reservoir of linguistic and cultural data. If digitized and made accessible, these sources could not only enrich AI models but also make cultural heritage more widely available to people around the world.

(v) Benefits of Digitizing Cultural Heritage

The digitization of cultural heritage holds transformative potential for AI. By incorporating diverse and historically significant data, AI models could develop a more comprehensive understanding of human history, languages, and culture. This would lead to more accurate and context-aware models, capable of better serving the needs of people from different cultural backgrounds.

Furthermore, digitizing these vast collections would help preserve cultural heritage from the risks of neglect, conflict, and climate change. This would safeguard priceless historical documents for future generations while also making them accessible for research, education, and public use.

The economic benefits of making these data publicly available are equally significant. By providing large, transparent datasets, smaller companies, startups, and the open-source AI community could develop their own AI applications without relying on the proprietary datasets controlled by tech giants. This would foster innovation, level the playing field, and reduce the dominance of a few large corporations in the AI industry.

(vi) Lessons from Italy and Canada

Some countries have already recognized the potential of digitizing cultural heritage. Italy, for instance, allocated €500 million from its “Next Generation EU” package to create a “Digital Library” project aimed at digitizing and making its cultural heritage available as open data. Unfortunately, the project has since been deprioritized and restructured, demonstrating a lack of foresight in recognizing the long-term value of such initiatives.

Canada’s Official Languages Act offers a valuable lesson in this regard. Although initially criticized as wasteful, the policy requiring bilingual institutions eventually created one of the most valuable datasets for training machine translation systems. This example shows how policies promoting linguistic diversity can have unexpected technological and economic benefits.

Recent debates over the adoption of regional languages in Spain’s Cortes and the European Union’s institutions have overlooked the potential benefits of digitizing low-resource languages. Promoting the digitization of these languages would not only help preserve them but also contribute to the development of more inclusive and culturally aware AI models.

(vii) The Need for Cultural Policies in AI Development

As AI development accelerates, it is essential that we do not overlook the potential of cultural heritage to enrich AI systems and democratize access to knowledge. Digitizing the vast archives of human history, culture, and languages is key to creating AI models that truly reflect the diversity of human experience. It is also crucial for ensuring that the benefits of AI are distributed equitably.

To achieve this, we need more than just regulatory frameworks; we need cultural policies that prioritize the digitization and open sharing of cultural data. These policies should be designed to support the preservation of cultural heritage, promote linguistic diversity, and make high-quality data available to a wide range of AI developers. By doing so, we can foster innovation, preserve history, and build AI systems that are inclusive, ethical, and beneficial to all.

(viii) Conclusion

Regulation alone is not enough to shape the future of AI. Cultural policies that promote the digitization and open sharing of diverse, high-quality data are essential for unlocking AI’s full potential. By making cultural heritage accessible to AI systems, we can preserve history, democratize knowledge, and create more inclusive and equitable AI technologies. In this way, AI can be developed in a manner that respects and reflects the rich diversity of human culture, while ensuring that its benefits are shared by all.

Source of this Topic : https://www.thehindu.com/opinion/op-ed/ai-needs-cultural-policies-not-just-regulation/article68469548.ece

"www.upscinterview.in" एक अनुभव आधारित पहल है जिसे राजेन्द्र मोहविया सर ने UPSC CSE की तैयारी कर रहे विद्यार्थियों के लिए मार्गदर्शन देने के उद्देश्य से शुरू किया है। यह पहल विद्यार्थियों की समझ और विश्लेषणात्मक कौशल को बढ़ाने के लिए विभिन्न कोर्स प्रदान करती है। उदाहरण के लिए, सामान्य अध्ययन और इतिहास वैकल्पिक विषय से संबंधित टॉपिक वाइज मटेरियल, विगत वर्षों में पूछे गए प्रश्नों का मॉडल उत्तर, प्रीलिम्स और मेन्स टेस्ट सीरीज़, दैनिक उत्तर लेखन, मेंटरशिप, करंट अफेयर्स आदि, ताकि आप अपना IAS बनने का सपना साकार कर सकें।

Leave a Comment

Translate »
www.upscinterview.in
1
Hello Student
Hello 👋
Can we help you?
Call Now Button