The most common techniques in industry are k-means and hierarchical clustering. These are fully unsupervised techniques based on distance metrics. In my view, these are rarely an appropriate choice except in very rare circumstances or when the data has been properly pre-processed — the data they require is numeric (this is a small but real issue), and, much more importantly, the curse of dimensionality means that notions of distance fail when you have a lot of features, which is true for basically every segmentation application. You often get very noisy and unstable results.
In my opinion, the best all-purpose way to pre-process the data is to use the proximity matrix from random forests — either supervised or unsupervised — as the input into the clustering algorithm. Using supervised random forests (with, e.g., some measure of intent or purchase behavior as the dependent variable) is an especially good way to do this, as the differences you’re capturing are only those that have some impact on the end-goal you care about. Principal components analysis is often used to pre-process the data but I typically find the results to be unhelpful and not interpretable (e.g. there is typically an “everything” dimension, even with a varimax/other rotation).
Finally, the way we typically do it at my firm — Gradient Metrics — is to use matrix factorization/matrix completion techniques. These have the benefits that (a) they are not based on distance in high dimensions, (b) you don’t need a complete set of measurements, and (c) you obtain a lower-dimension way to understand both the rows (the customers, or the segments) and the columns (the variables, questions, dimensions, &c). Singular value decomposition, non-negative matrix factorization, matrix completion, &c, are all techniques that can be brought to bear here.