Vercel Logo
DOCS
DEVELOPER GUIDELINE
CROSS SYNC JOIN

Cross-Sync Joins

Cross-sync joins are a powerful feature in Syncraft that allows querying across different data sources in a seamless and efficient manner. This capability is particularly useful in scenarios where data is distributed across multiple databases or services, and there's a need to aggregate or correlate this data.

Let's delve into how cross-sync joins work within a QueryAndSelect pipeline using the provided example:

Try In Playground
{ "query": { "id": { "eq": 2 }, "sessions": { "query": { "Timestamp": { "gt": "2023-09-04" }, "UserID": { "eq": { "@ref": [ "id" ] } } }, "selection": { "UserID": true, "ProductID": true, "Category": true, "ActionType": true, "SubCategory": true, "Season": true, "Timestamp": true } } }, "selection": { "id": true, "name": true }, "root": "Users" }

In the given pipeline, we are querying the Users model initially, and based on the result, a subsequent query is issued to the sessions relation, which is stored in a different data source (Clickhouse). This is a classic scenario of a cross-sync join where data from multiple sync sources (Postgres for Users and Clickhouse for sessions) is being correlated.

Deferment by Query Planner

A key aspect of executing such cross-sync joins is the deferment of queries by the query planner. In this case:

  1. Initially, a query is sent to Postgres to fetch the user with id equal to 2.
  2. The query planner then defers the query to Clickhouse for fetching session records until the result from the initial query is obtained.
  3. Once the user information is retrieved from Postgres, the query planner uses the id of the user to execute the deferred query on Clickhouse to fetch session records for that particular user.

This deferment is crucial for the efficiency and correctness of the cross-sync join, ensuring that all necessary data from one sync source is available before querying the other sync source.

Relational Query Structure

The relational query structure within the sessions relation is a crucial part of this cross-sync join:

"sessions": { "query": { "Timestamp": { "gt": "2023-09-04" }, "UserID": { "eq": { "@ref": [ "id" ] } } }, "selection": { "UserID": true, "ProductID": true, "Category": true, "ActionType": true, "SubCategory": true, "Season": true, "Timestamp": true } }

In this block:

  • The query part specifies the conditions for fetching session records, including a timestamp filter and a user ID filter.
  • The UserID filter uses the @ref parameter to reference the id field from the result of the initial query on the Users model.
  • The selection part specifies which fields to include in the output.

Synthetic Models: Defining Models On-the-Fly

In Syncraft, you're not limited to pre-defined models. You can dynamically create models within a query by specifying the model data. This feature is particularly useful when there's a need to interact with data sources in a more flexible or ad-hoc manner. These dynamically created models are referred to as Synthetic Models.

Here's an example showcasing how to define a synthetic model on-the-fly:

Try In Playground
{ "query": { "id": { "eq": 2 }, "arbitraryFieldNameWhereSessionsWillBeStored": { "query": { "Timestamp": { "gt": "2023-09-04" }, "UserID": { "eq": { "@ref": [ "id" ] } } }, "selection": { "UserID": true, "Season": true, "Timestamp": true }, "model": { "sync": "Clickhouse", "name": "ArbitraryName", "dataPath": "testdb.SessionRecorder" } } }, "selection": { "id": true, "name": true }, "root": "Users" }

In the provided example:

  • Under the query block, a synthetic model is defined within the arbitraryFieldNameWhereSessionsWillBeStored field.
  • The model field specifies the details of the synthetic model:
  • sync specifies the data source (Clickhouse in this case).
  • name provides a name for the synthetic model (ArbitraryName).
  • dataPath directs to the specific location or table (testdb.SessionRecorder) where the data resides within the sync source.
  • This synthetic model is then utilized in the query and selection blocks to fetch data based on the criteria provided.

This feature provides a layer of flexibility, enabling dynamic interaction with various data sources without the necessity of pre-defining all the models, thus offering a more streamlined and adaptable querying experience.

Cross-sync joins enable complex querying across multiple data sources, enhancing the data retrieval and correlation capabilities in Syncraft. The deferment by the query planner ensures the orderly execution of queries across different sync sources, maintaining the integrity and coherence of the data being fetched and correlated.

Learn More
Query & Select
Continue Your Journey
Data Consolidation With Remap