Unlocking Full User Journeys: Server-Side Session & User Stitching for GA4 with GTM & Cloud Run
Unlocking Full User Journeys: Server-Side Session & User Stitching for GA4 with GTM & Cloud Run
You've harnessed the power of server-side Google Analytics 4 (GA4), leveraging Google Tag Manager (GTM) Server Container on Cloud Run to centralize data collection, apply transformations, enrich events, and enforce granular consent. This architecture provides robust control, accuracy, and compliance, forming the backbone of your modern analytics strategy.
However, a fundamental challenge remains for a complete understanding of your customer journey: robustly managing user sessions and stitching together anonymous (client_id) and authenticated (user_id) user data. In a privacy-first world, client-side limitations (browser Intelligent Tracking Prevention, ad-blockers, inconsistent cookie handling) often lead to:
- Broken Sessions: Shortened cookie lifespans or inconsistent client-side session management can prematurely end GA4 sessions, inflating session counts and fragmenting user behavior.
- Fragmented User Journeys: A user browsing anonymously (identified by
client_id) before logging in (identified byuser_id) is often treated as two distinct users in analytics, making it difficult to understand their full journey. - Inaccurate Attribution: Without consistent session and user identifiers, attributing conversions to the correct touchpoints becomes unreliable.
- Limited Personalization: An inability to consistently identify a user (whether anonymous or authenticated) across interactions hampers personalization efforts.
The problem, then, is the need for a server-side mechanism that can reliably manage GA4 session boundaries, bridge the gap between anonymous and authenticated user identities, and ensure a persistent user view across your entire analytics and marketing ecosystem. Relying solely on client-side methods for these critical functions is increasingly insufficient and unreliable.
Why Server-Side for Session & User Stitching?
Moving session and user identity management to your GTM Server Container on Cloud Run offers significant advantages:
- Resilience: Server-side logic operates independently of client-side browser restrictions (ITP, ad-blockers), leading to more stable session and user identification.
- Consistency: Centralized logic ensures that
ga_session_idanduser_idare managed uniformly across all events, regardless of client-side variations. - Unified Identity: A single server-side service can map anonymous
client_ids to authenticateduser_ids, providing a holistic view of the customer journey. - Enhanced Control: Programmatic control over session timeouts and
user_idprioritization allows for analytics that more closely align with your business definitions of a session and a user. - Data Quality: Cleaner, more accurate
ga_session_idanduser_idparameters lead to more reliable GA4 reports and deeper insights.
Our Solution: Server-Side Identity & Session Management with GTM SC, Cloud Run & Firestore
Our solution introduces a dedicated Identity & Session Management Service built on Cloud Run and Firestore. This service will be called early in your GTM Server Container's processing flow to:
- Resolve
user_id: Prioritize an authenticateduser_idif available, and if not, maintain an anonymousclient_id. It will also store a persistent mapping betweenclient_idanduser_idin Firestore. - Manage
ga_session_id: Determine the current session's ID based on the event timestamp and previous session activity stored in Firestore, respecting GA4's default 30-minute session timeout (or a custom one). - Return Resolved Identifiers: The service returns the most appropriate
user_idandga_session_idfor the event, which are then injected into the GTM SC'seventData. - Dispatch to GA4: These resolved identifiers are then used in your GA4 tags to send robust, stitched data to GA4.
This pattern empowers you to build a comprehensive, privacy-aware, and accurate view of every customer's journey, from their first anonymous visit to their authenticated interactions.
Architecture: Server-Side Identity & Session Resolution
We'll integrate this new "Identity & Session Service" early in the GTM Server Container's processing flow.
graph TD
A[User Browser/Client-Side] -->|1. Event (client_id, user_id (if logged in), event_timestamp)| B(GTM Web Container);
B -->|2. HTTP Request to GTM Server Container Endpoint| C(GTM Server Container on Cloud Run);
subgraph GTM Server Container Initial Processing
C --> D{3. GTM SC Client Processes Event};
D --> E[4. Custom Tag/Variable: Call Identity & Session Service (High Priority)];
E -->|5. HTTP Request with client_id, user_id, event_timestamp| F[Identity & Session Service (Python on Cloud Run)];
F -->|6a. Look up client_id-user_id map & Session State| G[Firestore (User Map, Session State)];
F -->|6b. Resolve user_id, Determine/Update ga_session_id, Persist State| G;
G -->|7. Return Resolved user_id, ga_session_id| F;
F -->|8. Return Resolved Identifiers to GTM SC| E;
E -->|9. Add Resolved Identifiers to Event Data (_resolved.user_id, _resolved.ga_session_id)| D;
end
D --> J[10. Other GTM SC Processing (Data Quality, Enrichment, Consent)];
J -->|11. Dispatch to GA4 Measurement Protocol (using _resolved IDs)| K[Google Analytics 4];
K --> L[GA4 Reports & Explorations];
Key Flow:
- Client-Side Event: A user interaction triggers an event. The GTM Web Container sends this to your GTM Server Container, including the
client_id(from the_gacookie), an optionaluser_id(if the user is logged in), and the event'stimestamp. - GTM SC Ingestion: GTM SC receives the HTTP request.
- Identity & Session Resolution (Early): A high-priority custom variable in your GTM Server Container extracts the incoming
client_id,user_id, andevent_timestamp, and makes an HTTP call to yourIdentity & Session Service(Cloud Run). - Service Logic (Cloud Run):
- User Stitching: It checks Firestore for an existing mapping between this
client_idand auser_id. If auser_idis provided in the current event, it updates or creates this map. The service then returns the most consistentuser_id(preferring authenticated over anonymous, if a link exists). - Session Management: It retrieves the last recorded event timestamp for the
client_id. If the current event is more than 30 minutes after the last recorded event, a newga_session_idis generated. Otherwise, the existingga_session_idis reused. The session state (last event timestamp,ga_session_id) is updated in Firestore.
- User Stitching: It checks Firestore for an existing mapping between this
- GTM SC Updates Event Data: The GTM SC receives the resolved
user_idandga_session_idand adds them to the event'seventData(e.g.,_resolved.user_id,_resolved.ga_session_id). - Dispatch to GA4: The event, now enriched with robust
user_idandga_session_idparameters, proceeds through other GTM SC transformations, consent checks, and is dispatched to GA4 via the Measurement Protocol.
Core Components Deep Dive & Implementation Steps
1. Firestore Setup: User Mapping & Session State
Firestore will store the persistent mapping between client_id and user_id, and maintain the session state (last event timestamp, current ga_session_id) for each client_id.
a. Create a Firestore Database:
- In the GCP Console, navigate to Firestore.
- Choose "Native mode" and select a region close to your Cloud Run services.
b. Structure Your Data:
We'll use two collections: user_identity_map and session_state.
user_identity_map collection:
- Document ID:
client_id(e.g.,GA1.1.123456789.0) - Fields:
user_id: The associated authenticated user ID (if known).last_updated: Timestamp of the last update.first_seen_at: Timestamp when thisclient_idwas first seen.
session_state collection:
- Document ID:
client_id - Fields:
ga_session_id: The currentga_session_idfor thisclient_id.last_event_timestamp: Theevent_timestamp(in milliseconds) of the last event for thisclient_id.session_start_timestamp: Theevent_timestamp(in milliseconds) when the currentga_session_idstarted.session_number: The sequential session count for thisclient_id.
2. Python Identity & Session Service (Cloud Run)
This Flask application will receive client_id, user_id, and event_timestamp, perform the lookup/update logic in Firestore, and return the resolved user_id and ga_session_id.
identity-session-service/main.py example:
import os
import json
import random
import time
from flask import Flask, request, jsonify
from google.cloud import firestore
import logging
import datetime
app = Flask(__name__)
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Initialize Firestore client
try:
db = firestore.Client()
logger.info("Firestore client initialized.")
except Exception as e:
logger.error(f"Error initializing Firestore client: {e}")
# In production, decide if this should crash or return safe defaults
# Configuration for session timeout (GA4 default is 30 minutes)
SESSION_TIMEOUT_MS = int(os.environ.get('SESSION_TIMEOUT_MINUTES', '30')) * 60 * 1000
@app.route('/resolve-identity-session', methods=['POST'])
def resolve_identity_session():
"""
Receives client_id, user_id, event_timestamp.
Resolves consistent user_id and ga_session_id, updates state in Firestore.
"""
if not request.is_json:
logger.warning(f"Request is not JSON. Content-Type: {request.headers.get('Content-Type')}")
return jsonify({'error': 'Request must be JSON'}), 400
try:
data = request.get_json()
client_id = data.get('client_id')
incoming_user_id = data.get('user_id')
event_timestamp_ms = data.get('event_timestamp') # Expected in milliseconds
if not client_id or not event_timestamp_ms:
logger.error("Missing client_id or event_timestamp in request.")
return jsonify({'error': 'Missing client_id or event_timestamp'}), 400
# --- 1. User Identity Resolution ---
user_map_ref = db.collection('user_identity_map').document(client_id)
current_time_ms = int(time.time() * 1000)
resolved_user_id = incoming_user_id # Start with incoming user_id
user_map_doc = user_map_ref.get()
if user_map_doc.exists:
user_map_data = user_map_doc.to_dict()
stored_user_id = user_map_data.get('user_id')
if stored_user_id and not resolved_user_id:
# If we have a stored user_id but no incoming, use the stored one
resolved_user_id = stored_user_id
logger.debug(f"Resolved user_id for {client_id} from storage: {resolved_user_id}")
elif resolved_user_id and stored_user_id != resolved_user_id:
# If incoming user_id differs from stored, update the map
user_map_ref.update({
'user_id': resolved_user_id,
'last_updated': firestore.SERVER_TIMESTAMP
})
logger.info(f"Updated user_id map for {client_id}: {stored_user_id} -> {resolved_user_id}")
elif resolved_user_id:
# New client_id with an incoming user_id, create new map entry
user_map_ref.set({
'user_id': resolved_user_id,
'first_seen_at': firestore.SERVER_TIMESTAMP,
'last_updated': firestore.SERVER_TIMESTAMP
})
logger.info(f"Created new user_id map for {client_id} with user_id: {resolved_user_id}")
# If no user_id at all (anonymous), use client_id as pseudo-user_id for context
if not resolved_user_id:
resolved_user_id = client_id
logger.debug(f"No user_id found, using client_id as resolved_user_id: {resolved_user_id}")
# --- 2. Session Management ---
session_state_ref = db.collection('session_state').document(client_id)
session_state_doc = session_state_ref.get()
resolved_ga_session_id = None
session_number = 1
if session_state_doc.exists:
state_data = session_state_doc.to_dict()
last_event_timestamp = state_data.get('last_event_timestamp', 0)
if (event_timestamp_ms - last_event_timestamp) < SESSION_TIMEOUT_MS:
# Session is still active, reuse existing ga_session_id
resolved_ga_session_id = state_data.get('ga_session_id')
session_number = state_data.get('session_number', 1)
logger.debug(f"Reusing session {resolved_ga_session_id} for {client_id}. Session number: {session_number}")
else:
# Session timed out, start a new one
resolved_ga_session_id = f"{client_id}.{event_timestamp_ms}.{random.randint(100, 999)}" # Mimic GA4 format
session_number = state_data.get('session_number', 0) + 1
session_state_ref.update({
'ga_session_id': resolved_ga_session_id,
'last_event_timestamp': event_timestamp_ms,
'session_start_timestamp': event_timestamp_ms,
'session_number': session_number
})
logger.info(f"New session {resolved_ga_session_id} started for {client_id}. Session number: {session_number}")
else:
# First event for this client_id, start a new session
resolved_ga_session_id = f"{client_id}.{event_timestamp_ms}.{random.randint(100, 999)}" # Mimic GA4 format
session_state_ref.set({
'ga_session_id': resolved_ga_session_id,
'last_event_timestamp': event_timestamp_ms,
'session_start_timestamp': event_timestamp_ms,
'session_number': 1
})
logger.info(f"First session {resolved_ga_session_id} started for new client_id {client_id}.")
# Always update last_event_timestamp for active sessions too
if resolved_ga_session_id: # Only update if a session ID was successfully resolved/created
session_state_ref.update({
'last_event_timestamp': event_timestamp_ms
})
return jsonify({
'resolved_user_id': resolved_user_id,
'resolved_ga_session_id': resolved_ga_session_id,
'session_number': session_number
}), 200
except Exception as e:
logger.error(f"Error during identity/session resolution for client_id {client_id}: {e}", exc_info=True)
# On error, provide fallback values to avoid breaking GA4 tracking
return jsonify({
'resolved_user_id': incoming_user_id or client_id, # Fallback to incoming or client_id
'resolved_ga_session_id': f"{client_id}.{event_timestamp_ms}.error", # Indicate error
'session_number': 0 # Indicate error or unknown
}), 500
if __name__ == '__main__':
app.run(debug=True, host='0.0.0.0', port=int(os.environ.get('PORT', 8080)))
identity-session-service/requirements.txt:
Flask
google-cloud-firestore
Deploy the Python service to Cloud Run:
gcloud run deploy identity-session-service \
--source ./identity-session-service \
--platform managed \
--region YOUR_GCP_REGION \
--allow-unauthenticated \
--set-env-vars \
GCP_PROJECT_ID="YOUR_GCP_PROJECT_ID",\
SESSION_TIMEOUT_MINUTES="30" \
--memory 512Mi \
--cpu 1 \
--timeout 30s # Allow enough time for Firestore operations
Important:
- Replace
YOUR_GCP_PROJECT_IDandYOUR_GCP_REGIONwith your actual values. - The
--allow-unauthenticatedflag is for simplicity. In production, consider authenticated invocations as discussed in previous posts. - Ensure the Cloud Run service identity has the
roles/datastore.userrole (which covers Firestore read/write access) on your GCP project. - Note down the URL of this deployed Cloud Run service.
3. GTM Server Container Custom Variable Template
Create a custom variable template in your GTM Server Container that fires early to call the Identity & Session Service and set the resolved identifiers in eventData.
GTM SC Custom Variable Template: Identity & Session Resolver
const sendHttpRequest = require('sendHttpRequest');
const JSON = require('JSON');
const log = require('log');
const getEventData = require('getEventData');
const setInEventData = require('setInEventData');
const getRequestHeader = require('getRequestHeader'); // For _ga cookie, if not already extracted
// Configuration fields for the template:
// - identityServiceUrl: Text input for your Cloud Run Identity & Session service URL
// - clientIdVariable: Text input, name of the variable holding client_id (e.g., '{{Event Data - _event_metadata.client_id}}')
// - userIdVariable: Text input, name of the variable holding user_id (e.g., '{{Event Data - user_id}}' or '{{Event Data - logged_in_user_id}}')
// - eventTimestampVariable: Text input, name of the variable holding event timestamp in milliseconds (e.g., '{{Event Data - gtm.start}}')
const identityServiceUrl = data.identityServiceUrl;
const client_id = getEventData(data.clientIdVariable);
const user_id = getEventData(data.userIdVariable); // This will be undefined if no user_id is sent
const event_timestamp_ms = getEventData(data.eventTimestampVariable);
// Check for required inputs
if (!identityServiceUrl) {
log('Identity & Session Service URL is not configured.', 'ERROR');
data.gtmOnSuccess({}); // Return empty object, let downstream handle defaults
return;
}
if (!client_id || !event_timestamp_ms) {
log('Client ID or Event Timestamp is missing. Cannot resolve identity/session.', 'ERROR');
// Fallback: Set some default, potentially using raw incoming values
setInEventData('_resolved.user_id', user_id || client_id, true);
setInEventData('_resolved.ga_session_id', 'missing_id_error', true);
data.gtmOnSuccess(getEventData('_resolved'));
return;
}
log(`Requesting identity/session for client ID: ${client_id.substring(0, 20)}... and user ID: ${user_id || 'anonymous'}.`, 'INFO');
const payload = {
client_id: client_id,
user_id: user_id, // Will be undefined if not provided
event_timestamp: event_timestamp_ms
};
sendHttpRequest(identityServiceUrl + '/resolve-identity-session', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify(payload),
timeout: 5000 // 5 seconds timeout for service call
}, (statusCode, headers, body) => {
if (statusCode >= 200 && statusCode < 300) {
try {
const response = JSON.parse(body);
const resolved_user_id = response.resolved_user_id;
const resolved_ga_session_id = response.resolved_ga_session_id;
const session_number = response.session_number;
log(`Resolved: User ID='${resolved_user_id}', Session ID='${resolved_ga_session_id}', Session Number=${session_number}`, 'INFO');
// Store the resolved identifiers in the event data, ephemeral for this event
setInEventData('_resolved.user_id', resolved_user_id, true);
setInEventData('_resolved.ga_session_id', resolved_ga_session_id, true);
setInEventData('_resolved.session_number', session_number, true);
data.gtmOnSuccess(getEventData('_resolved')); // Return the object
} catch (e) {
log('Error parsing Identity & Session service response:', e, 'ERROR');
// Fallback: Use incoming IDs or generated defaults on error
setInEventData('_resolved.user_id', user_id || client_id, true);
setInEventData('_resolved.ga_session_id', 'parse_error', true);
setInEventData('_resolved.session_number', 0, true);
data.gtmOnSuccess(getEventData('_resolved'));
}
} else {
log('Identity & Session service call failed:', statusCode, body, 'ERROR');
// Fallback: Use incoming IDs or generated defaults on HTTP error
setInEventData('_resolved.user_id', user_id || client_id, true);
setInEventData('_resolved.ga_session_id', 'http_error', true);
setInEventData('_resolved.session_number', 0, true);
data.gtmOnSuccess(getEventData('_resolved'));
}
});
GTM SC Configuration:
- Create a new Custom Variable Template named
Identity & Session Resolver. - Paste the code. Add permissions:
Access event data,Send HTTP requests,Access request headers(if yourclientIdVariableneeds to read cookies directly). - Create a Custom Variable (e.g.,
{{Resolved Identity & Session}}) using this template. - Configure:
identityServiceUrl: The URL of your Cloud Run service (https://identity-session-service-YOUR_HASH-YOUR_REGION.a.run.app).clientIdVariable:{{Event Data - _event_metadata.client_id}}(This is the most reliable way to get the client ID after the GA4 Client has processed the incoming request).userIdVariable:{{Event Data - user_id}}(or whatever data layer variable you push for authenticated user IDs).eventTimestampVariable:{{Event Data - gtm.start}}(GA4's default event timestamp in milliseconds).
- Crucially, set the trigger for this variable to
Initialization - All PagesorAll Eventsand ensure it has a very high priority (e.g., -100) in your container. This guarantees it runs as early as possible, before any other tags (GA4, Facebook CAPI, etc.) fire that might need the resolved identity/session information.
4. Using the Resolved Identifiers in Your GA4 Tag
Once the Identity & Session Resolver variable (e.g., {{Resolved Identity & Session}}) has run, the resolved user_id, ga_session_id, and session_number are available in your eventData under the _resolved namespace.
Update Your GA4 Configuration Tag in GTM SC:
This tag should fire first to establish the user_id and session_id for all subsequent events.
- In your GA4 Configuration Tag, under "Fields to Set", add:
Field Name:user_idValue:{{Resolved Identity & Session.user_id}}Field Name:session_idValue:{{Resolved Identity & Session.ga_session_id}}
- You can also pass
session_numberas a user property or event parameter for enhanced analysis:Field Name:session_numberValue:{{Resolved Identity & Session.session_number}}(and register this as a Custom Dimension in GA4 UI withUserscope).
Update Your GA4 Event Tags in GTM SC:
For any other GA4 event tags (e.g., page_view, purchase):
- Ensure they inherit the
user_idandsession_idfrom the Configuration Tag. - If you want to explicitly send
session_numberwith every event, you can add it as an event parameter:Parameter Name:session_numberValue:{{Resolved Identity & Session.session_number}}(and register as an Event-scoped Custom Dimension in GA4 UI).
This ensures every event sent to GA4 includes the user's consistently managed user_id and ga_session_id, allowing for accurate user journey and session analysis.
5. Leveraging in GA4 Reports and Explorations
Once the data flows into GA4 with these resolved identifiers, you can:
- View User Journeys: Use the
user_idfor cross-device analysis in User Explorer and Path Explorations. - Accurate Session Metrics: Trust your session counts, engagement rates, and conversion rates, knowing they are based on a robust server-side session definition.
- Custom Reporting: Create custom reports or Explorations using the
session_numberor other derived session metrics to analyze user behavior over multiple sessions. - Audiences: Build audiences based on combined anonymous and authenticated behavior, leveraging the
user_idand consistent session data.
Benefits of This Server-Side Approach
- Holistic User Journeys: Unify anonymous and authenticated data for a complete view of customer interactions across devices and time.
- Accurate GA4 Reporting: Overcome client-side limitations to provide reliable session and user metrics, leading to more trustworthy insights.
- Enhanced Data Quality: Consistent, server-side managed identifiers improve the overall quality and integrity of your GA4 data.
- Resilience & Future-Proofing: Your core identity and session management logic is protected from browser changes and client-side failures.
- Centralized Control: Manage all user identity and session rules from a single, server-controlled environment.
- Improved Personalization & Activation: A more accurate understanding of user identity fuels more effective personalization and targeted marketing campaigns.
Important Considerations
- Latency: Adding an extra HTTP request round trip to the Identity & Session Service will introduce some milliseconds to your initial GTM SC processing. Firestore is very fast, but monitor this closely. For most analytics use cases, the benefits outweigh this minimal added latency.
- Cost: Firestore reads/writes and Cloud Run invocations incur costs. Monitor usage, especially for high-volume sites. Implementing basic caching (e.g., in the Cloud Run service, for
client_id-user_idmappings that don't change often) can help manage costs. - Identity Resolution Complexity: This solution focuses on
client_id(anonymous) anduser_id(authenticated). True enterprise-level identity resolution can be far more complex, involving multiple identifiers and probabilistic matching. This solution provides a strong foundation. - PII: While
user_iditself should be a non-PII identifier, ensure no raw PII is stored in Firestore without appropriate hashing or encryption. - GA4 Identity Space: GA4 supports
User-IDas a primary identifier. When you senduser_idvia the Measurement Protocol, GA4 prioritizes it. If nouser_idis sent, it falls back toclient_id. This server-side solution ensuresuser_idis always available when known. - Monitoring: Use Cloud Monitoring to track the performance and error rates of your
Identity & Session Serviceand Cloud Firestore. Monitor for any backlogs or failed assignments.
Conclusion
Achieving a complete, accurate, and reliable view of your customer journeys is paramount for modern analytics. By implementing server-side session management and user stitching with your GTM Server Container, a dedicated Cloud Run service, and Firestore, you transform fragmented client-side data into a unified, resilient, and actionable dataset. This advanced server-side capability empowers you to overcome browser limitations, understand the full anonymous-to-authenticated user lifecycle, and drive more informed business decisions with confidence. Embrace server-side identity and session management to unlock the full potential of your GA4 analytics.