Last week, Gradient was divided. Half our team was applying our current skills and servicing clients. The other half was investing in new skills and learning about the latest methodologies in the marketing analytics & data science realm.
Marcin, our lead data scientist and R expert, spent his time preparing for the WhyR conference (as a co-organizer) in Warsaw, Poland, as well as joining sessions to stay up-to-date on the latest developments of this statistical programming language.
Tom, Gradient's founder and statistical genius, spent his time learning state-of-the-art methodologies in San Diego, at the Sawtooth Conference. The conference was focused on choice experiments (conjoint & MaxDiff) as well as market segmentation. Topics that we already have great knowledge in, but are always eager to improve. Read on for a summary of each conference.
On the first day of the conference we gathered 50 participants willing to take a day off and participate in a Data Visualizations Hackathon. Participants had to use Google Cloud Platform APIs to extract places (such as restaurants, cafes and other service spots) in Warsaw in addition to their ratings and busiest times of day. The aim was to build a dashboard, report, or an application that provides a valuable business application. The data can be found here, but here's a sample of my favorite use cases:
- What is the best place for advertisements in the city?
- Where within Warsaw are services missing?
- Are the most crowded places properly equipped with a sufficient amount of city bike stations?
- What are the traffic trends throughout the day?
- Does the occupancy of a neighborhood correspond with real estate prices?
The highlight of the second day was a full day on GAMs (generalized additive linear models) facilitated by Matteo Fasiolo, a student of Simon Wood, and author of the famous mgcv package.
All the participants had a chance to broaden their R knowledge related to shiny, data.table, rcpp, keras, XAI, DALEX, mgcv, geo spatial analysis, mlr, drake and many many more!
As an organizer, I had the additional advantage of meeting all the presenters during a pre-conference dinner on Thursday. It was both inspiring and a lot of fun!
The last two days of the conference were lecture days, divided into two tracks. Europe's best researchers showed R applications in Business, Biology, Bioinformatics, Computer Vision, Statistical Modeling and GeoSpatial Analysis. Additionally there was also a shiny session and 2 hours of inspiring lightning-speed talks. Check out the slides from the six inspiring keynotes.
But my favorite sessions were:
- "Random Forests: The First-Choice Method For Every Data Analysis?"
Marvin Wright, author of the famous Ranger Package.
- "tfprobably correct - Adding Uncertainty to Deep Learning with TensorFlow Probability"
Sigrid Keydana, from RStudio
- "Always Be Deploying. How to make R great for machine learning in (not only) Enterprise"
Sawtooth Conference, San Diego
I was in sunny San Diego for a conference held by Sawtooth Software. Every 18 months, Sawtooth hosts about 200 people for a few days of papers and presentations. We heard from organizations like Google, Riot Games, and Proctor & Gamble, about how they're incorporating choice modeling into their research and decision-making. There was way too much to summarize in a short blog post, but my main takeaway is that Max-Diff is becoming table-stakes for everyday survey design.
There are fewer and fewer reasons to ever use Likert scale grids in surveys, and more and more reasons to replace this clunky format with Max Diff in virtually all circumstances.
Hard and boring for the respondent
More interesting for the respondent
Vulnerable to scale-use bias
Forces each respondent to be on same scale
Data is on ordinal scale
(Modeled) data is on ratio scale
Here are some interesting applications of Max Diff we saw at the conference:
We saw organizations incorporating Max Diff into their product and brand trackers. Instead of tracking the "Percent Agree" to a particular question, they are tracking the share of preference for a specific set of attributes. One organization even had a "frustration tracker", where they tracked the biggest frustrations of working with their product — which they could see spike when features were added or removed. With Max Diff incorporated into your survey tracker, you can track more items more reliably than with traditional methods.
Using Max Diff in early stage product research
Max Diff is essentially the ideal tool to gain insight into which features users want. At early stages of product development, product managers typically have more features (or problems to solve) than they could possibly tackle at the outset — all the more reason to have them prioritized. We saw organizations like Google and Mozilla using Max Diff as a simple and easy way to get a robust prioritization of their feature roadmap.
Using Max Diff to develop and implement segmentations
Typical segmentations use batteries of Likert-style questions to drive their clustering. Often, the segmenting variables don't contain enough variation within each question or scale-use bias drives most of the clustering results. Max Diff is a better and more principled way of developing segments with better data, which you can develop with latent class models. Typing tools remain a (small) challenge with this format, but this is increasingly being addressed with new analytics approaches.