Cross-sync joins are a powerful feature in Syncraft that allows querying across different data sources in a seamless and efficient manner. This capability is particularly useful in scenarios where data is distributed across multiple databases or services, and there's a need to aggregate or correlate this data.
Let's delve into how cross-sync joins work within a QueryAndSelect pipeline using the provided example:
{
"query": {
"id": { "eq": 2 },
"sessions": {
"query": {
"Timestamp": { "gt": "2023-09-04" },
"UserID": {
"eq": {
"@ref": [ "id" ]
}
}
},
"selection": {
"UserID": true,
"ProductID": true,
"Category": true,
"ActionType": true,
"SubCategory": true,
"Season": true,
"Timestamp": true
}
}
},
"selection": {
"id": true,
"name": true
},
"root": "Users"
}
In the given pipeline, we are querying the Users
model initially, and based on the result, a subsequent query is issued to the sessions
relation, which is stored in a different data source (Clickhouse). This is a classic scenario of a cross-sync join where data from multiple sync sources (Postgres for Users and Clickhouse for sessions) is being correlated.
A key aspect of executing such cross-sync joins is the deferment of queries by the query planner. In this case:
id
equal to 2.id
of the user to execute the deferred query on Clickhouse to fetch session records for that particular user.This deferment is crucial for the efficiency and correctness of the cross-sync join, ensuring that all necessary data from one sync source is available before querying the other sync source.
The relational query structure within the sessions
relation is a crucial part of this cross-sync join:
"sessions": {
"query": {
"Timestamp": { "gt": "2023-09-04" },
"UserID": {
"eq": {
"@ref": [ "id" ]
}
}
},
"selection": {
"UserID": true,
"ProductID": true,
"Category": true,
"ActionType": true,
"SubCategory": true,
"Season": true,
"Timestamp": true
}
}
In this block:
query
part specifies the conditions for fetching session records, including a timestamp filter and a user ID filter.UserID
filter uses the @ref
parameter to reference the id
field from the result of the initial query on the Users
model.selection
part specifies which fields to include in the output.In Syncraft, you're not limited to pre-defined models. You can dynamically create models within a query by specifying the model data. This feature is particularly useful when there's a need to interact with data sources in a more flexible or ad-hoc manner. These dynamically created models are referred to as Synthetic Models.
Here's an example showcasing how to define a synthetic model on-the-fly:
{
"query": {
"id": { "eq": 2 },
"arbitraryFieldNameWhereSessionsWillBeStored": {
"query": {
"Timestamp": { "gt": "2023-09-04" },
"UserID": {
"eq": {
"@ref": [ "id" ]
}
}
},
"selection": {
"UserID": true,
"Season": true,
"Timestamp": true
},
"model": {
"sync": "Clickhouse",
"name": "ArbitraryName",
"dataPath": "testdb.SessionRecorder"
}
}
},
"selection": {
"id": true,
"name": true
},
"root": "Users"
}
In the provided example:
arbitraryFieldNameWhereSessionsWillBeStored
field.model
field specifies the details of the synthetic model:sync
specifies the data source (Clickhouse in this case).name
provides a name for the synthetic model (ArbitraryName
).dataPath
directs to the specific location or table (testdb.SessionRecorder
) where the data resides within the sync source.This feature provides a layer of flexibility, enabling dynamic interaction with various data sources without the necessity of pre-defining all the models, thus offering a more streamlined and adaptable querying experience.
Cross-sync joins enable complex querying across multiple data sources, enhancing the data retrieval and correlation capabilities in Syncraft. The deferment by the query planner ensures the orderly execution of queries across different sync sources, maintaining the integrity and coherence of the data being fetched and correlated.