Retrieve the distinct partition keys of a Cassandra table
Selecting the distinct partition keys of a Cassandra table is very straightforward when performing a CQL query to retrieve the first page
of DISTINCT
partition keys of the Cassandra table, but in order to select all the distinct partition keys of the table without needing
to retrieve all the rows of the table the TOKEN
Cassandra function needs to be employed.
Let’s consider for the sake of having a concrete example to work on that we store the bookmarks made by the users in a Cassandra table.
CREATE TABLE DEMO.USER_BOOKMARKS (
user_id UUID,
timestamp TIMEUUID,
url VARCHAR,
PRIMARY KEY (user_id, timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC)
When the following query:
SELECT DISTINCT user_id
FROM demo.user_bookmarks
LIMIT 1000
is executed, it will provide the first page containing 1000 distinct partition keys of the table demo.user_bookmarks
.
For selecting the next page, an extra filter based the TOKEN
function has to be employed in order to retrieve
the next page of distinct partition keys which all have the token greater as the token of the last user_id
selected on the previous page.
SELECT DISTINCT user_id
FROM demo.user_bookmarks
WHERE TOKEN(user_id) > TOKEN(userIds[999])
LIMIT 1000
The corresponding java implementation looks like this:
private ResultSet getDistinctUserIdsBatch(Session session, Optional<UUID> lastUserIdProcessed,
int batchSize) {
var batchedDistinctUserSelect = QueryBuilder.select(USER_ID_COLUMN_NAME)
.distinct()
.from(DEMO_KEYSPACE_NAME, USER_BOOKMARKS_TABLE_NAME);
lastUserIdProcessed.ifPresent(
uuid -> batchedDistinctUserSelect.where(gt(token(USER_ID_COLUMN_NAME), token(uuid))));
batchedDistinctUserSelect.limit(batchSize);
return session.execute(batchedDistinctUserSelect);
}
The Cassandra TOKEN documentation is a good starter to
get a feeling of what the TOKEN
function does.
The java project cassandra-select-distinct-partition-keys provides a runnable test which starts a Cassandra docker container (via testcontainers library) fills it with data (the bookmarks made by a list of random users) and selects the distinct user identifiers.
Simply use mvn clean install
for trying out the project.