Beware the supercolumn, its a trap for the unwary!

Since most developers who have used databases are familiar with traditional RDBMS systems, I’ll explain Cassandra SuperColumns as succinctly as possible using some tables.

The first is an example of a row from a column_family.

select * from user_1;

column_name column_value
name kelley reynolds
favorite_drink hard cider

As you can see, this shows some name/value pairs, and those values are easy to get quickly. Now a supercolumn, on the other hand, is usually illustrated like so:

select * from users;

supercolumn_name column_name column_value
user_1 name kelley reynolds
user_1 favorite_drink hard cider
user_2 name kevin way
user_2 favorite_drink port ellen

But this would be WRONG! It would actually be illustrated more like this:

select * from users;

supercolumn_name subcolumns
user_1 {name:’kelley reynolds’,favorite_drink:’hard cider’}
user_2 {name:’kevin way’,favorite_drink:’port ellen’}

And now you should be thinking to yourself, “Self, that’s silly! I’d never do it that way. If I wanted to access just the favorite drink of one of those people, I’d have to read the entire thing, deserialize it, then read just the row I was after. I can’t read just one subcolumn with it done that way!”

Exactly!

Bookmark and Share

About Kelley Reynolds

A full-stack software engineer, an avid trail runner, and a bassoonist. Kelley occasionally writes about one of his many projects on this blog.

  • curiousaboutcass

    Hey Kelley, Thanks for answering my questions in #cassandra yesterday. Looking forward to your post on time-series data!

  • Mark

    You’ve just named the biggest *advantage* of Cassandra’s SuperColumns — they aggregate data on write, in a manner that is idempotent in the face of write failures and retries, and the retrieve the data in one ordered block on read.The use case you describe is a great example of why not to use a SuperColumn. But, If you have a process that’s continuously adding new data to a column, and when you want to retrieve it, you want it all, a SuperColumn is a valid design choice.Yes, you can also do this “one level down” with composite keys and column names, but this approach has other drawbacks if you have to traverse the resulting row keys in order, using the RandomPartitioner.

  • Kelley Reynolds

    #Mark,You are of course correct, SuperColumns are there for a reason, and there are several valid use cases for them as you mention. The reason for this post is that they are frequently described as ‘just another level of hash’ which while sort of logically true, has operational implications that people need to be aware of when designing their applications, especially as there is often the mistaken notion that subcolumns are distributed around the ring and they are not.I don’t believe they are without utility, but as the title states, I do think they are a trap for the unwary.