SDK Developer Guide
Overview
The Geodesix SDK is a software package that enables content suppliers and content consumers to integrate with Geodesix data services.
Currently, the SDK is intended for use by content suppliers to push data into the Geodesix Data Queue. Once ingested, this data is processed and later made available for search through the Geodesix RAG system.
Data processing is performed asynchronously by the backend processing pipeline. Using the SDK only places the data into the Data Queue; it does not trigger or guarantee immediate processing.
The processing stage depends on completing all onboarding phases with the customer. This does not pose an issue: as long as Geodesix R&D has verified that data is being pushed correctly, the customer may continue sending data to the queue—even if additional processing tasks, such as developing custom parsers, are still required.
Content suppliers can track the status of their data at each stage of the processing pipeline through the Customer/Content Management System (CMS) Portal.
*** The CMS portal is currently under development.
Web version of this document can be found here: https://geodesix.atlassian.net/wiki/external/YmU2OGJjMjMyNDY2NGVjNWIyZjU1ZjBmNmE5YWQwZWY
Content Suppliers - Getting started
Prerequisites
Please complete the following two steps:
- Accessing the PHP SDK code
- Getting the required SDK settings
Accessing the PHP SDK Code
Access to the SDK code is provided through a private GitHub repository, available only to members of the Geodesix SDK Access team. To join this team, the supplier must appoint a developer and provide Geodesix with the developer’s GitHub account ID or email address so an invitation can be issued.
- Appoint a PHP developer to handle the SDK integration task.
- Ask the developer to send their GitHub ID or email address (if a GitHub account has not yet been created) to the Geodesix account manager.
- The developer should wait for the invitation to join the Geodesix SDK Access team and accept it once it arrives.
- After accepting the invitation, the developer will gain access to the SDK repository on GitHub: git@github.com:geodesix-io/geodesix-sdk-php.git
Once access is granted, the developer can proceed with the installation process.
Getting the Required SDK Settings
Before you can use the SDK, Geodesix account-manager should provide you with two pieces of data:
- Repository name - this is the name of the folder in the queue where your data will be stored. Examples: “company1”, “mybusiness” etc.
- AWS API credentials that include:
- Access Key (will be set later in GSX_SDK_ACCESS_KEY_ID)
- Secret key (will be set later in GSX_SDK_SECRET_ACCESS_KEY)
Preparing your development environment
This guide assumes that the developer is familiar with PHP programming, understands how to use Composer, and is working with a modern development environment.
- To use the SDK, you will need an IDE that supports PHP development.
- In your IDE, set the PHP language level to 8.3 or higher. For example, in PhpStorm you can configure the language level through the project settings; similar options exist in other IDEs. Read more details here: this link
- Composer must be installed on your machine (or at the project level). Refer to the official Composer documentation for installation instructions.
- Git client must be installed.
Installation Procedure
The procedure described below assumes that:
- A developer joined the SDK Access team.
- SDK settings has been received.
- PHP development environment is ready.
The Geodesix SDK is provided as a plain PHP code project, and it is used through a simple include statement. To work with the SDK, you must download the code and place it in a directory within the project where you plan to implement the integration.
Downloading the SDK code
Use Git to download the code:
git clone git@github.com:geodesix-io/geodesix-sdk-php.git
Installing Dependencies
Use Composer to install the dependencies. Geodesix SDK requires the AWS API, which is installed by running Composer tool with the composer.json file located at the SDK root
cd geodesix-sdk-php
composer install
Setting up the SDK’s AWS credentials
The most important rule when working with SDK credentials is to keep them private and never store them in a source‑control repository (e.g., Git, SVN, VSS).
The Geodesix SDK expects AWS credentials to be defined either as environment variables or as constants in your PHP application. When the SDK loads, it searches for credentials in the following order:
- Environment variables, if they are defined
- PHP constants, if environment variables are not found
Below are two recommended approaches for defining credentials so the SDK can locate them correctly.
Approach 1: Store credentials in a secret manager and set environment variables
If you are integrating the Geodesix SDK into an existing project, you may already be using a secret‑management service such as AWS Secrets Manager or Google Cloud Secret Manager. In that case, you can store the SDK credentials in your existing secret storage. Your application then needs to read the credentials from the secret manager and set the following environment variables:
- GSX_SDK_ACCESS_KEY_ID — set to the access key provided by Geodesix
- GSX_SDK_SECRET_ACCESS_KEY — set to the secret key provided by Geodesix
Here is a pseudo code for doing this:
// Get the values from the secret-manager
$accessKey = $secretData["GSX_SDK_ACCESS_KEY_ID"];
$secretKey = $secretData["GSX_SDK_SECRET_ACCESS_KEY"];
// Set two environment variables
putenv("GSX_SDK_ACCESS_KEY_ID={$accessKey}");
putenv("GSX_SDK_SECRET_ACCESS_KEY={$secretKey}");
Approach 2: Define a private file that is not included in the code repository
- Navigate to the root directory of the SDK project (for example, using your IDE).
- Create a directory named private.
- Inside this directory, create a new PHP file named my.cfg.php and place the configuration code shown below into it.
- Ensure that this directory is excluded from your version‑control system.
- If you are using Git, the SDK’s .gitignore file is already configured to ignore the private directory.
- The SDK automatically attempts to include /private/my.cfg.php, so you do not need to manually add an include statement
Example of creating a directory and configuration file using the CLI
PS C:\Temp\sdktest> cd .\geodesix-sdk-php\
PS C:\Temp\sdktest\geodesix-sdk-php> mkdir private
PS C:\Temp\sdktest\geodesix-sdk-php> cd private
PS PS C:\Temp\sdktest\geodesix-sdk-php> type nul > my.cfg.php
Code for creating my.cfg.php
<?php
const GSX_SDK_ACCESS_KEY_ID = "[REPLACE WITH ACCESS KEY]";
const GSX_SDK_SECRET_ACCESS_KEY = "[REPLACE WITH SECRET KEY]";
Check the Connection with the AWS S3 Service
- Navigate to the root directory of the SDK project.
- Enter the code_example directory.
- Open the file Check_Access.php in your IDE.
- Set the value of $repository to the repository name you received from Geodesix.
- Run the example using either your IDE or the command line.
If everything is configured correctly, the program will output “CONNECTION OK”.
If any configuration is missing or incorrect, the program will output “CONNECTION PROBLEM” followed by an error trace.
Example running with CLI
PS C:\Temp\sdktest\geodesix-sdk-php\code_example> php .\Check_Access.php
CONNECTION OK
Example code after setting the repository name
<?php
namespace geodesix\examples;
use Exception;
use geodesix\Geodesix;
/**
* In this class you can see a code for testing the connection with AWS service.
* Using this test can help to check that AWS credentials provided by Geodesix, are set correctly.
*/
// Include the SDK
include __DIR__ . "/../geodesix/includes.php";
try {
// SET HERE THE REPOSITORY NAME YOR RECEIVED FROM GEODESIX
$repository = "[repository name]";
// Create bare client and test the connection with AWS service.
$gsxClient = Geodesix::createBasicDataQueueClient($repository);
$gsxClient->testAWSConnection();
} catch (Exception $e) {
echo $e;
}
Push HTML Data
In this section and the following sections, we will show you with few code examples for pushing HTML data. The examples in this section assumes that all the preparation steps described by the previous chapters has been made.
Most of the code is self explanatory, and yet we will highlight the important issues to note.
Source code is in file: [sdk-root]/code_example/Example_PushHtml.php
// Include the SDK
include __DIR__ . "/../geodesix/includes.php";
try {
// SET HERE THE REPOSITORY NAME YOR RECEIVED FROM GEODESIX
$repository = "[repository name]";
// Given the web page URL
$url = "https://example.com/story14/about/product";
// Given the HTML string (full html) of a page
$html = "<html><body>Example with clean url '$url'</body></html>";
// Creating the SDK client for pushing data to Geodesix queue
$gsxClient = Geodesix::createBasicDataQueueClient($repository);
// Write the page data
$uid = $gsxClient->pushHTML($url, $html);
// Read the data by uid of stored page (!for test only purpose!)
$data = $gsxClient->readItemByUid($uid, "example.com", Consts::DATA_TYPE_HTML);
echo "\nData that was written:";
echo "\n" . json_encode($data, JSON_PRETTY_PRINT);
// Read page data by page's url (for test only purpose)
$data = $gsxClient->readHTMLByUrl($url, Consts::DATA_TYPE_HTML);
echo "\nRead result:";
echo "\n" . json_encode($data, JSON_PRETTY_PRINT);
} catch (Exception $e) {
echo $e;
}
Highlights for this example:
- To use the SDK, include the file located at geodesix/includes.php. Additional include files may appear in this directory; some are intended for other features or internal system use.
- The SDK client used for pushing data is initialized with the repository name provided by Geodesix. Your account credentials grant access only to the queue folder within this repository. Using any other repository name will cause the SDK to fail.
- The HTML and URL values must be supplied by your CMS. The SDK does not scrape content; it expects raw data provided directly by the supplier.
- URL values must represent the clean, exact path of the web resource, without additional query parameters. The SDK does not use the URL verbatim; instead, it parses the URL and reconstructs a normalized URI containing only the required components. As a best practice, provide clean URLs from the start.
- The HTML value must contain valid and full HTML syntax. Geodesix parses the HTML to extract meaningful text blocks, titles, descriptions, summaries, and more. You may remove unused scripts or style elements to reduce file size and minimize the amount of data processed by Geodesix.
- The read section in the code examples are not required for real integrations; it is included only for testing purposes.
- Since you will typically process multiple files, there is no need to recreate the client each time. Create the client once and reuse it for all subsequent operations.
Here is a simplified version of the code, with comments and the read section removed:
// Include the SDK
include __DIR__ . "/../geodesix/includes.php";
$url = "https://example.com/story14/about/product";
$html = "<html><body>Example with clean url '$url'</body></html>";
$gsxClient = Geodesix::createBasicDataQueueClient([repository name]);
$gsxClient->pushHTML($url, $html);
Push HTML Data when the resource is identified by query parameter
In the previous example, the URL identified the web page resource using only the URL path. In this example, you will slightly modify the code to support a resource that is identified by a query parameter named id.
Providing the exact URL is critical because Geodesix uses it to generate the universal unique identifier (UUID) for the content item in the system.
Later, it is recommended to review the examples in code_example/Example_TestUrl.php, which demonstrate how Geodesix cleans and normalizes URLs (when necessary) to retain only the parts that uniquely identify the resource.
Pay attention to the code in lines 10 and 16, as well as the corresponding read operation in line 24.
In line 10, the URL uses a query parameter to identify the resource:
$url = "https://example.com/story14/about/product?id=19";
In lines 15 and 24, the third argument is used to specify the query parameter that identifies the resource ID.
Source code can be found in file: [sdk-root]/code_example/Example_PushHtmlQuery.php
// Create the client
include __DIR__ . "/../geodesix/includes.php";
try {
// SET HERE THE REPOSITORY NAME YOR RECEIVED FROM GEODESIX
$repository = "[repository name]";
// Creating the SDK client for pushing data to Geodesix queue
$gsxClient = Geodesix::createBasicDataQueueClient($repository);
// Given the web page URL. NOTE THE QUERY STRING PART id=19.
// In this example 'id' os a query parameter that define the resource.
$url = "https://example.com/story14/about/product?id=19";
// Given the HTML string (full html) of a page
$html = "<html><body>Example with query parameter 'id' in url: '$url' </body></html>";
// Write the page data
//
// NOTE THE VALUE OF $queryParamId. It gets that name of query parameter that identify the resource.
$uid = $gsxClient->pushHTML($url, $html, null, "id");
// Read the data by uid of stored page (!for test only purpose!)
$data = $gsxClient->readItemByUid($uid, "example.com", Consts::DATA_TYPE_HTML);
echo "\nData that was written:";
echo "\n" . json_encode($data, JSON_PRETTY_PRINT);
// Read page data by page's url (for test only purpose)
//
// NOTE THE VALUE OF $queryParamId. We use it here too.
$data = $gsxClient->readHTMLByUrl($url, Consts::DATA_TYPE_HTML, "id");
echo "\nRead result:";
echo "\n" . json_encode($data, JSON_PRETTY_PRINT);
} catch (Exception $e) {
echo $e;
}
Summary
This initial version of the SDK includes the essential components needed to begin pushing data into Geodesix. Additional examples and capabilities will be added in future releases.