16 min read
Complete Data Financialization
Vana has been gaining significant traction in recent times, and for good reason. What Vana aims to solve is simple - your personal data remains yours, and you receive proper compensation, in a world vying to use your data for Artificial Intelligence, advertisement and a host of other uses.
In the whitepaper, Anna Kazlauskas brings up an important point: before the market forces took over, the internet was promised to be a beacon of individuality, of sovereign data ownership. The subsequent rise of Ethereum proved that blockchain was not just “internet money” but instead a highly programmable technology.
Vana ushers from developments thus far, taking the best of the internet’s original goals and amalgamating them with modern blockchain technology. Vana’s entire vision can be summarized in a few words - user data sovereignty, enabling collective creation.
The modern internet runs on personal data—but the people generating it are almost entirely excluded from its economic upside.
In 2024, Meta generated $49.63 per user globally and up to $70 in Europe. In the U.S., Google extracted a staggering $460 per user from ad revenues alone—up from $393 in 2022. Google alone reported $264.59 billion in ad revenue last year, largely off data users didn’t even know they were giving up. Other companies’ statistics are also in the tune of billions, taking your total data worth to almost $700 [Proton].
This imbalance isn’t just financial—it’s systemic. A full 92% of Americans are concerned about their online privacy, yet only 3% understand current data privacy laws. Globally, 85% want more control, and nearly half have stopped using a service over privacy concerns [Usercentric].
Meanwhile, the financial fallout from data breaches is escalating rapidly. The average cost of a breach hit $4.88 million in 2024, up 10% from the previous year. In just five years, breach costs have surged 27%—and users are caught in the crossfire.
Against this backdrop, Vana’s premise of giving users ownership and economic agency over their data feels less like an optional innovation and more like a necessary correction.
Vana outlines four major pillars that form the foundation of its vision:
The sovereignty of personal servers
Your data lives with you. Rather than being hoarded in opaque corporate vaults, data is stored under the user’s control: encrypted, private, and permissioned. Whether hosted locally or via decentralized storage, users retain the final say in how and when their data is used.
The coordination capabilities of blockchain networks
Coordination is not possible without trust and verifiability. Blockchain enables a shared public ledger that enforces permissioning, token flows, and governance, allowing many independent agents to work together around data, without central intermediaries.
The privacy guarantees of modern cryptography
Trusted execution environments (TEEs), encryption, and programmable permissions ensure that data remains confidential, even during computation. Privacy isn't reliant on policy; it’s enforced by math.
The economic incentives of tokenized markets
Vana turns data into an economic primitive. Tokens represent ownership and access rights, enabling users to earn value from their data while letting AI builders interact with previously siloed datasets in a secure, fair, and auditable manner.
With these principles set, let’s explore how Vana puts them into practice through its system architecture.
At the core of Vana’s model are DataDAOs—community-governed data collectives that allow individuals to pool their private data while retaining full ownership and control. While isolated data points have little utility, aggregated, validated, and structured datasets become highly valuable for training AI models.
Each DataDAO is composed of three essential parts:
Data Liquidity Pools (DLPs) that validate data and assign scores using proof-of-contribution logic
VRC-20 tokens, issued per dataset, representing each contributor's stake and controlling access
Governance contracts that define validation rules, reward logic, and data access policies
DataDAOs solve two key challenges of collaborative data:
Sybil resistance ensures people can’t game the system by submitting fake identities to earn tokens. This is achieved through cryptographic validation of each contribution.
Data valuation maps messy, non-standard data into scored, structured units that can be quantified and rewarded.
A DataDAO is like a co-op where members contribute raw materials (data), and the system verifies and refines them into a shared, usable product, all while ensuring no one cheats the pool.
Vana operates on its own EVM-compatible Layer 1 blockchain, designed from the ground up to coordinate private data workflows.
Importantly, this chain never stores raw data. Instead, it records:
Proof of contribution: that data was submitted and validated
Access permissions: who can use the data and under what conditions
Smart contract logic: governing token rewards, usage rules, and job approvals
Smart contracts orchestrate the entire data economy: from DLP logic to reward distribution to TEE job approvals.
Data Liquidity Pools (DLPs) validate data, assign scores, and issue VRC-20 tokens. When a builder wants to use that data, they must burn tokens and request compute access. This request is checked and approved by smart contracts—ensuring rules are enforced transparently.
What sets Vana apart is its tight integration with off-chain compute infrastructure, enabling a smooth, privacy-preserving bridge between on-chain governance and off-chain AI computation.
AI models require computation—but raw data never needs to be exposed. Vana uses a secure compute layer powered by Trusted Execution Environments (TEEs), including GPU-enabled variants for machine learning tasks.
Within a TEE, the user’s data remains encrypted and isolated from the outside world. The only things allowed to happen are:
Only approved code runs
Only approved data fields are accessed
All compute activity is logged and verifiable onchain
It's like training an AI model without ever looking at the actual data. A builder can request: “Give me the average sleep duration of users aged 30–35,” and the TEE returns only that. No names, no emails, no unapproved access.
This makes it possible to train powerful models without violating user privacy—a foundational leap forward in ethical AI.
Every user on Vana is represented by an EVM-compatible wallet—not just for payments, but as a full cryptographic identity.
When users contribute data:
They encrypt and sign it with their private key
They choose which DataDAOs to share it with
All access permissions are wallet-specific and enforced automatically
This creates a seamless attribution system:
VRC-20 tokens are tied to the wallet that contributed the data
Rewards flow to that wallet whenever the data is accessed
Permissions can be revoked or adjusted at any time
The user wallet is a passport across Vana. It’s how data gets attributed, how rewards get paid, and how rights are enforced, all without needing an account or username.
The Vana ecosystem runs on two interoperable tokens that coordinate incentives and access:
VANA: The native token used to pay for jobs, governance actions, and protocol fees. It also serves as the base trading pair for all other tokens in the system.
VRC-20 tokens: Dataset-specific tokens earned by contributing data. These are required to access datasets and participate in DataDAO governance.
When a builder wants to use data, they must burn both VANA and the relevant VRC-20 tokens—at a fixed ratio of 20:80.
Burning tokens means destroying them permanently, reducing the total supply. This creates deflationary pressure and ensures that value accrues to remaining token holders and contributors. The more a dataset is used, the scarcer and more valuable its tokens become.
Because VRC-20 tokens are ERC-20 compatible, they can be integrated into DeFi ecosystems—staked, bundled, or even traded on decentralized exchanges. But unlike generic tokens, they also carry dataset-specific logic, enabling enforceable permissions and transparent attribution.
Vana has moved from promise to proof. Over the last three years, the project has consistently hit technical, community, and funding milestones.
$25M raised across four rounds from institutional investors like Coinbase Ventures, Paradigm, and Polychain.
The Mainnet was launched in December 2024, with the $VANA token going live an hour later.
Testnet adoption: 1.3M users contributed 6.5M+ data points.
Post-launch impact (as of May 2025): 12M data points onboarded and live data sales across multiple DataDAOs.
If DataDAOs were a theoretical innovation, the Reddit Data DAO, launched by Vana in April 2024, made them real.
The DAO onboarded over 140,000 Reddit users, pooling their posts, comments, and messages with the goal of training a user-controlled AI model.
On the flipside, Reddit itself would have sold such data, and infact has, for around $70 million, to OpenAI - with almost no awareness amongst its users.
https://x.com/rdatadao/status/1929323999206477931
Other DataDAOs followed:
DNA DAO seeks to reclaim genomic data.
Oura DAO captures sleep and biometric signals.
Resume DAO, Syd, and MindDAO focus on employment, cybersecurity, and mental health, respectively.
As mentioned earlier, the VRC-20 standard underpins all of this. As the “ERC-20 for data,” it makes data programmable, governable, and tradable. VRC-20 tokens are already live across multiple DAOs, representing contribution, voting power, and access rights in a unified format.
Today’s AI models are hungry for fresh data—but the internet’s public data well is nearly dry. Estimates suggest the web holds just 0.1–4% of the world’s data. The rest? Locked in platforms, inboxes, health records, and usage histories.
This “AI data wall” is now widely acknowledged. And Vana’s solution—user-consented, portable, encrypted data—has rapidly gained traction:
16+ operational DataDAOs live as of June 2025, with over 20 in pipeline.
1M+ users actively contributing across the network
300+ proposed data pools, from browser history to DEX trading data.
Personalized AI apps—from health models to Hinge profile generators—built on top of user-owned datasets.
Vana’s developer community is also maturing. With an SDK, full TEE support, and smart contract modules available on GitHub, Vana is positioning itself not just as a protocol—but as an operating system for DataFi.
The Vana ecosystem now spans 19+ live DataDAOs, each built around specific data domains, from sleep and genomics to trading behavior and conversational AI. A closer look reveals four key trends shaping this decentralized data economy:
FHE adoption: Sight FHE and Mind Network already use fully homomorphic encryption, allowing encrypted computation.
TEE integration: Nearly all collectives leverage Trusted Execution Environments to ensure private, verifiable compute.
Model ownership: At least 5 DAOs are explicitly building or contributing to user-owned models via fine-tuning or Initial Model Offerings (IMOs).
Multimodal data: Over 10 collectives blend text, sensor, biometric, or behavioral inputs—building rich, AI-ready datasets.
Reddit DataDAO: Vana’s flagship, with 140,000+ Reddit users pooling content to train a user-owned LLM. Pioneered IMOs with ORA and closed real-world data licensing deals.
GPT DataDAO: Enables users to contribute chat transcripts for AI training—rewarded via dataset-specific VRC‑20 tokens.
UNWRAPPED: Crowdsources Spotify streaming data. Its first dataset was licensed to SoloAI, showcasing community-led data monetization.
sleep.fun: Aggregates wearable sleep metrics for health-focused AI. Contributors earn $REM for each validated data point.
MindDAO: Tracks mood data linked to Web3 behavior. Participants report weekly to earn $MIND and support mental health modeling.
AsteriskDAO: Focused on women’s health data—helping close systemic gaps in medical research by crowdsourcing female-specific insights.
Finquarium: A forecasting DAO for encrypted analyst predictions. Uses token incentives and track records to surface reliable financial insights.
DataPIG: A quirky DeFi-focused DAO combining market data, memes, and sentiment to create trader-friendly analytics tools.
YKYR: A browser plugin that lets users monetize their search and browsing habits on their own terms.
dFusion Social Truth: Builds a trusted, crowd-validated dataset to fight misinformation—fueled by contributor rewards and curation.
VanaTensor: Built for fine-tuning models via RLHF. Feeds Bittensor validators with high-quality human-labelled signals.
vChars AI: Collects Telegram chat patterns for LLM training—proving conversational data can be crowdsourced securely.
Mind Network: Combines FHE, privacy voting, and trusted compute for secure data markets and decentralized AI infrastructure.
Devdock: Lets developers share anonymized workflow data and debugging metadata—fueling better dev-centric AI.
Barbarika: Captures human reasoning patterns to train open-agent world models for robotics and simulations.
Auto DLP: Targets mobility data—from vehicle telemetry to driving behavior—monetized by users through secure compute.
Voogle: A general-purpose data marketplace for everyday behavior—turns clicks, steps, and habits into tokenized insights.
Vana's DataDAO ecosystem is fostering innovation across a wide range of industries, presenting significant sector-specific opportunities:
Personalized Medicine & Research:
DataDAOs such as DNA DataDAO (genetics), MindDAO (mental health), Oura DAO (sleep data), and AsteriskDAO (focused on women’s non-reproductive health) are reshaping how individuals control, share, and monetize sensitive health data. These collectives can unlock high-value datasets to train AI models that improve preventative care, predict health risks, detect rare diseases, and personalize treatment plans.
Through federated learning and blockchain, these systems can enhance model accuracy while preserving data privacy and compliance. AsteriskDAO is especially impactful, addressing historical gender bias in medical research by sourcing inclusive, global data on underrepresented aspects of women’s health.
Enhanced Predictive Analytics:
Platforms like Finquarium, a decentralized marketplace for validated financial forecasts, and DataPIG, which generates AI-driven investment insights from users’ trading activity, are enabling smarter, more transparent financial decision-making.
These platforms contribute real-time, high-quality data that can train AI models for better market forecasting, risk profiling, and personalized investment strategies. Importantly, the use of fully homomorphic encryption (FHE) or similar privacy-preserving technologies can ensure that sensitive financial data remains confidential—making such tools more attractive to traditional institutions.
Trend Identification & Recommendation Systems:
DataDAOs like Reddit DataDAO, Unwrapped DAO (Spotify data), PrimeInsights DAO (Amazon reviews), Volara (Twitter/X data), and YKYR (browsing behavior) empower users to monetize their consumption and social engagement data.
This aggregated data is ideal for training recommendation engines, discovering consumer behavior trends, and identifying early cultural signals. Projects like ChirperAI, which simulate social interaction using AI agents trained on real-world dialogue, illustrate how social DataDAOs can be leveraged to create new types of engagement tools.
Specialized Model Training:
Initiatives such as GPT DataDAO (contributions from ChatGPT users), SixGPT (synthetic data generation), VanaTensor (for reinforcement learning with human feedback), and DevDock (developer code sharing) offer rich, curated training data for AI development.
These sources help fine-tune models to be more domain-specific, less biased, and more accurate—especially in areas like natural language processing, code generation, and safe alignment. The ecosystem also includes early experimentation with social truth verification (e.g., dFusion) to help mitigate misinformation in AI outputs.
Robotics & Autonomous Systems:
The IoT Data DAO aggregates sensor data from smart devices, offering a foundation for more robust robotics and autonomous systems. Future DataDAOs could specialize in self-driving car data—especially rare edge cases like unusual road conditions or extreme weather—and in-home mapping data from devices like Roomba or Ring. Such datasets could improve everything from urban planning to robotic navigation and ergonomic furniture design, all while preserving user privacy.
Research & Academia:
Academic-focused DataDAOs could open access to niche, unpublished research data, including drafts, notes, and experiment logs. This would allow AI systems to learn not just from polished publications, but also from the broader context of research efforts, including failed or inconclusive studies—enhancing transparency and replicability in science.
Language & Translation:
Language-focused DataDAOs could let users contribute anonymized text conversations to train more authentic, colloquial language models. Meanwhile, voice-oriented collectives could power real-time, emotion-aware speech-to-speech translation—preserving tone, nuance, and cultural specificity.
Knowledge Work from Screen Recordings:
Future DataDAOs could crowdsource screen and meeting recording data—via tools like Rewind or Read AI—to train AI agents capable of replicating complex knowledge work. This includes task planning, summarization, and documentation. Contributors could be rewarded for anonymized, high-quality data that reflects real-world productivity patterns.
Vana isn’t the only project eyeing the data economy. But it may be the only one building for its full financialization. Its architecture, incentives, and regulatory foresight distinguish it from both decentralized peers and centralized data brokers.
Projects like OORT DataHub are closest in spirit, offering blockchain-based data aggregation for AI. However, these typically focus on data supply logistics—less on direct individual ownership, comprehensive incentives, or liquidity for data assets.
Other players like Filecoin, Arweave, or IPFS solve for decentralized storage, and while many can leverage such storage solutions, they do not inherently provide programmable privacy, direct valuation mechanisms, or integrated AI model training layers.
Crucially, Vana is also actively collaborating with federated learning frameworks like Flower AI to enhance privacy in AI training while simultaneously enabling market models for individual data ownership and monetization.
The biggest players in data aren’t crypto-native—they’re the incumbents. Platforms like Meta, Google, Reddit, and even data brokers have turned user data into the world’s most valuable untaxed commodity. In 2024 alone, Meta generated $164B, and Google $264B, largely by monetizing data users gave away for free.
There is a stark difference in philosophy:
At its core, Vana replaces permissionless extraction with permissioned coordination; users still share data, but on their terms, with granular privacy and built-in attribution.
Vana is fighting the good fight. With our internet markets highly monopolized and focused on the extraction of profits, Vana is bringing these profits back to the providers of data. Data, as often quoted, is the future black gold, and it is the most commodified asset. Unless we bring structure, sovereignty, and economic weight to our personal data, users will always remain the product.
With constant improvements, strategic partnerships in AI and data portability, and a growing library of personalized use cases, Vana is establishing a new data paradigm. As the global need for data from AI skyrockets, Vana is preparing to remain readable and usable. Bringing privacy and compensation, in the form of user coordination, all powered by the chain.
Luganodes remains committed to Vana's vision. With our infrastructure, we enable this global paradigm shift.
Luganodes is a world-class, Swiss-operated, non-custodial blockchain infrastructure provider that has rapidly gained recognition in the industry for offering institutional-grade services. It was born out of the Lugano Plan B Program, an initiative driven by Tether and the City of Lugano. Luganodes maintains an exceptional 99.9% uptime with round-the-clock monitoring by SRE experts. With support for 45+ PoS networks, it ranks among the top validators on Polygon, Polkadot, Sui, and Tron. Luganodes prioritizes security and compliance, holding the distinction of being one of the first staking providers to adhere to all SOC 2 Type II, GDPR, and ISO 27001 standards as well as offering Chainproof insurance to institutional clients.
The information herein is for general informational purposes only and does not constitute legal, business, tax, professional, financial, or investment advice. No warranties are made regarding its accuracy, correctness, completeness, or reliability. Luganodes and its affiliates disclaim all liability for any losses or damages arising from reliance on this information. Luganodes is not obligated to update or amend any content. Use of this at your own risk. For any advice, please consult a qualified professional.