We’re excited to announce that Google Cloud’s collaborative work with Theta over the last several months, “Model Pipeline for Video-to-Text Applications”, is being presented at this year’s Google Cloud Next conference in San Francisco. The project will be presented by Brice Perez, Google Cloud’s Web3 customer engineer. If you are attending in person, the presentation will be at 9am PT on Thursday 8/31 at Moscone South — Level 2 — Room 211. For those that can’t attend, we have a modified version of the presentation available here. You can also watch the presentation online using the free Google Cloud Next Digital Pass.
This collaboration between Google Cloud and Theta Edge Network aims to combine the best of Cloud and Edge architectures. Cloud structures are stable with high availability, while edge networks have an unlimited amount of nodes, which are lower-capability but have massive parallel processing power, and are close to end users. By using both together, we can harness the power of both Cloud and Edge for emerging AI use cases in video, media and entertainment.
How does Theta Edge Network fit in? With several distinct advantages:
- Scale: a distributed network of 10k+ nodes around the world
- Availability: with a mix of cloud-based VMs and laptop/desktop, there is no single point of failure
- Resources: Comprised of vast amounts of idle CPU/GPU and memory/storage that can be harnessed into parallel processing power
- Simple UX: Jobs run in the background with no user interaction required
Model Pipeline Concept: GCP + Theta Edge Network
Applying this model, we conduct a task using multiple deep-learning models consecutively. Some of these models can be Google’s proprietary models accessible only through GCP’s API. Some are open-sourced models that can be served by the Theta edge network leveraging the massive parallel processing power.

Application 1: Semantic Video Search
In this first example of semantic video search multiple models are used consecutively through GCP and Theta Edge Network. Raw video is fed into Video-to-text by Google Imagen via the Vertex AI APIs, then clip summaries are sent to the large language model (LLM) and VectorDB hosted in the Theta Edge Network to create video summaries and tags. This allows the users to accurately locate interesting moments inside a long video from a massive video library using natural languages.
Application 2: Game Highlight Reel Generation
In this second example of game highlight reel generation, again multiple models are used consecutively. Raw video is fed into Video-to-text by Google Imagen via the Vertex AI APIs, clip summaries are sent to LLM on Theta Edge Network, then clip sentiments are used by ffmpeg in the Edge Network to create highlight reels, which can then be sent back to Imagen Video-to-text on GCP to create summaries of the highlight reels.
Implications of Cloud + Edge
These examples of applications utilizing Cloud and Edge services together have broad implications. By seamlessly switching between high-throughput / high-availability centralized services like GCP along with decentralized edge networks like Theta with massive parallel processing power, developers and enterprises can always have the best tool for the job at their fingertips instead of compromising on one axis. With video comprising 80% of global network traffic and growing, these video-focused use cases will be a major component of production AI use cases in the coming years.