USER EXPERIENCE RESEARCH

Three UXR challenges you’ll encounter researching products in the TV/video-streaming industry

Accessibility, language and familiarity: Understanding challenges, research approaches and key takeaways

Hajer Al Homedawy
The Startup
Published in
47 min readMay 6, 2020

--

Confidentiality agreements are central to user experience research. In this article, I don't discuss products I researched. I also don't disclose the names of the companies I worked with. Instead, I describe challenges I've learned conducting user experience research in the TV/video-streaming industry.   This article is for anyone looking for information and instructions on design research pertaining to the TV/video-streaming industry.*This article is part of my teaching series; it's a lengthier read than my other pieces (grab a cup of something).
Image Source

The TV/video-streaming industry. Netflix. Google Play. hulu. Disney. Amazon Fire TV. Apple TV. YouTube TV.

What do all these platforms have in common?

They all promise to provide you with an incredible media-service and content-consumption experience.

As you might have guessed, some do it better than others.

If you’re about to step into this industry as a researcher — you’re in for an adventure.

I want to ease your journey.

In this article, I discuss fruitful challenges I encountered in the TV/video-streaming industry. These challenges are situated in three overarching themes, which are:

  1. Legibility
  2. Language
  3. Familiarity

Each theme is broken down in the following way:

  1. “What’s the challenge?” In this section, I describe a specific example.
  2. “What’s the research approach?” Here I address the following three questions: “What do we really want to measure?”, “Which independent variables do we really want to look at?” and “What’s the research approach in a nutshell?”
  3. “What are the key takeaways?” Lastly, I describe key takeaway points.

Additionally, for any reader looking to practice, I’ve simulated data for each example. All simulated data files are located on my Open Science Framework page, in a project entitled, “Three UXR challenges: simulated datasets” (see link; all data files are in .CSV format).

Research terminology

I use various research terms throughout this article. Frequently used terms and their definitions are listed below.

For those interested in learning more, educational material is listed at the very end of this article.

(Skip this section if you’re acquainted with empirical research terminology.)

  • Measures are often referred to as dependent variables. A dependent variable is defined as “the variable being measured, so called because it may depend on manipulations of the independent variable” (Myers et al. 2012, p. 534).
  • An independent variable is defined as “the experimental factor that a researcher manipulates” (Myers et al. 2012, p. 536).
  • Correlational research is defined as “the study of the naturally occurring relationships among variables” (Myers et al. 2012, p. 534).
  • Experimental research is defined as “studies that seek clues to cause-effect relationships by manipulating one or more factors (independent variables) while controlling others (holding them constant)” (Myers et al. 2012, p. 535).
  • “The main distinction between what we could call correlational or cross-sectional research (where we observe what naturally goes on in the world without directly interfering with it) and experimental research (where we manipulate one variable to see its effect on another) is that experimentation involves the direct manipulation of variables. In correlational research we do things like observe natural events or we take a snapshot of many variables at a single point in time.” (Field, 2009, p. 12).
  • "Cross-sectional research examines data from a point in time, whereas longitudinal research examines data from across time. In a typical cross-sectional study, the variables are measured once on each case, during the same period...In a typical longitudinal study, the variables are measured repeatedly over different periods." (Menard, 2002).
  • "When numbers are involved the research involves quantitative methods, but you can also generate and test theories by analysing language (such as conversations, magazine articles, media broadcasts and so on). This involves qualitative methods...People can get quite passionate about which of these methods is best, which is a bit silly because they are complementary, not competing, approaches" (Field, 2009, p. 2).
  • “…hypotheses can also be directional or non-directional. A directional hypothesis states that an effect will occur, but it also states the direction of the effect. For example, ‘readers will know more about research methods after reading this chapter’ is a one tailed hypothesis because it states the direction of the effect (readers will know more). A non-directional hypothesis states that an effect will occur, but it doesn’t state the direction of the effect. For example, ‘readers’ knowledge of research methods will change after they have read this chapter’ does not tell us whether their knowledge will improve or get worse.” (Field, 2009, p. 27).
  • Exploratory research is “about putting one’s self deliberately in a place — again and again — where discovery is possible and broad, usually (but not always) nonspecialized interests can be pursued.” (Stebbins, 2001, p. 6).
  • An interview is defined as “a method of gathering information by talk, discussion, or direct questions” (Kaplan et al. 2013, p. 642).
  • A closed-ended question is a question that “can be answered specifically (for example “yes” or “no”). Such questions generally require the interviewee to recall something” (Kaplan et al. 2013, p. 640). For example, indicating “Blue” to the closed-ended question “What’s your favorite color?” (where there are only 3 possible responses: “Blue”, “Red” and “Yellow”).
  • An open-ended question is defined as “…a question that usually cannot be answered specifically. Such questions require the interviewee to produce something spontaneously” (Kaplan et al. 2013, p. 642). For example, a written response to the open-ended question “What’s your favorite color and why?”

Let’s now turn to the first theme — the importance of legibility.

1 — Legibility matters to users.

I’ll answer:

  1. What’s the challenge?
  2. What’s the research approach?
  3. What are the key takeaways?

What’s the challenge?

Have you ever noticed how in YouTube TV, the on-screen search keyboard is composed of uppercase letters? Have you ever noticed the search keyboard in Apple TV? It’s composed of uppercase letters and, curiously enough, provides an option for users to choose lowercase letters. Further, have you noticed how the characters in Apple TV are grey and are quite small compared to other platforms?

Take a moment to observe aesthetic and legibility differences between these letters:

More specifically,

  • Observe the differences between uppercase vs. lowercase letters.
  • Observe the differences between letters colored black vs. grey vs. white.
  • Observe the differences between large vs. small letters.
  • Observe the differences between letters in front of a solid-colored box vs. no solid-colored box.

Stand 10 feet away from the screen (to simulate a “10-foot experience”) and observe these differences again.

What about letters that look visually similar — i and j?

What about all the letters together?

What have you observed?

If you think these design differences make a minimal impact on the TV experience — think again.

Without a doubt, the TV experience is centered in visuals — but text legibility plays a major role in a viewer’s overall experience — however subtle text designs may be.

So we’re clear, text that’s legible is “capable of being read or deciphered, especially with ease” (source).

With that said, how are major platforms ensuring on-screen text legibility?

Apple TV and Amazon Fire TV provide a feature to help magnify on-screen images and texts (they’re referred to as “Zoom” & “Screen Magnifier”, respectively).

However, not all platforms offer this particular feature. Many media-service providers do not have a magnify feature embedded into their platform. As a result, many platforms have to improve designs within hardware related constraints. A platform that ranks lower on legibility ease than its competitors, especially one without an on-screen magnifier, will quickly prove to be a hindrance to a viewer’s goals.

Your Project Manager might say, “We’re concerned. We want to improve our designs — but if we change anything we might be taking a big risk when it comes to legibility — and accessibility-concerns are important to us.”

This is when a team will look toward a user experience researcher for answers.

You might be tasked to research which…

  • font size (small, e.g., 23 px vs. medium, e.g., 27 px vs. large, e.g., 35 px)
  • case (lowercase vs. uppercase)
  • color (e.g., black vs. grey vs. white)
  • background (boxed design vs. no boxed design) and
  • transparency-level (i.e., transparent boxed-design vs. opaque boxed-design)

…your team should go with in their next design iteration.

This is no small feat.

Studying “Web Content Accessibility Guidelines” (WCAG; and, if you’re in Canada, rules set fourth by the Canadian Radio-television and Telecommunications Commission) is an excellent step forward — but studying these guidelines will not prove primary research superfluous.

With that said, let’s shift to discuss potential research approaches.

What’s the research approach?

Aim. You want to uncover (any and all) legibility concerns for your team’s 10-foot designs (e.g., see guide layout below).

(Shaw TV’s guide layout for the 10-foot experience.)

At this point, you’re in ‘the conceptualization stage’, so before any research is conducted, you’ll have to address two questions:

  1. What do we really want to measure?
  2. Which independent variables do we really want to look at?

What do we really want to measure?

To address this question, you’ll have to:

  1. Communicate with your team. This is where a collaborative team session comes into play. A collaborative session might involve empathy-driven perspective-taking. (e.g., “do we really want our users, perhaps ready to watch a much-anticipated game or series, to be unable to read letters, titles and descriptions with ease? What about the senior community and the greater visually-impaired community? Aside from ‘legibility ease’, what else are we interested in measuring? What actionable insights are we expecting?”)
  2. Examine historical data. (You might ask, “do we have any company research on legibility — past research we can look at?”) and
  3. Conduct a literature review. (At this point, you might wonder, “how might we operationalize legibility such that it is well defined and quantifiable? How have other researchers operationalized legibility?”. *To operationalize a construct is to make it measurable.)

In his talk, entitled,“People, Products and Jetlag: Creativity Through Empathy”, Jens Riegelsberger states:

Measure what counts — not just what you can count.”

Consider deeply which measures will address legibility concerns:

  1. Legibility ease? You might ask a closed-ended question like this one: “How easy is it for you to read [insert text]”. You might ask this particular question multiple times for various text. Using an 11-point scale (from 0=“extremely hard” to 10=“extremely easy”) you’ll be able to quantify this construct. Furthermore, you might want to investigate nuances pertaining to text location. For example, how might on-screen location impact text-legibility (i.e., is text located at the bottom of the screen as legible as text located in the middle or top of the screen — despite being of the same size?). I suggest an 11-point scale for two reasons. First, using a 5-point Likert scale might prove ineffective at combating a potential ceiling effect (more on the ceiling effect below). Second, researchers often treat ordinal scales (like a 5- or 7-point Likert scale) as interval scales. There’s been a lot of debate regarding this particular matter. Today, researchers advocate for the use of an 11-point scale. Consider the following conclusions from Wu and Leung (2017): “There are pros and cons in using the Likert scale as an interval scale, but the controversy can be handled by increasing the number of points…To increase generalizability social work practitioners are encouraged to use 11-point Likert scales from 0 to 10, a natural and easily comprehensible range.” (Source).
  2. Reading speed? To take into account variability among your sample, a reading speed measure may be used as a covariate (more on this below).
  3. Eye-tracking? Investigating visual attention, via a heatmap (and/or other eye-tracking measures), will allow you to uncover any potential trends between visual attention and text size. To illustrate this point, review the four examples in Diagram 1.0. Pay particular attention to the heatmap distributions. Notice the heatmap for the design with small text. Interestingly, it’s more distributed than the design with large text. What does this mean? This finding might inform you that (when a viewer is trying to read) visual attention follows a hierarchy for larger text more so than smaller text. Said differently, when text size is small, a viewer’s eye is ‘less guided’ and as a result, visual attention may turn toward areas of the layout that it wouldn’t otherwise (within the first 10–15 seconds). This finding may prove somewhat problematic if your team wants small text (for the next design iteration) but wants to simultaneously preserve information hierarchy (i.e., so that visual attention is captured and guided within the first 10 seconds). However, how do heatmap distributions relate to legibility? While the primary research aim is to investigate text legibility, a secondary research aim is to investigate the impact of choosing a design with smaller text (than the current design). Even if a design with small text proves to be as legible (as the other designs) it may not be the best choice if it proves problematic on another measure (in this case, a heatmap distribution measure).
  4. Implicit-confidence? The term “dual attitudes” is defined as: “differing implicit (automatic) and explicit (consciously controlled) attitudes toward the same object. Verbalized explicit attitudes may change with education and persuasion; implicit attitudes change slowly, with practice that forms new habits” (Myers et al. 2012, p. 534). For our research purposes, investigating implicit rather than explicit self-confidence may prove more useful. Why? An implicit-confidence score will help tap into how participants are really feeling. If a participant indicates text to be highly legible but implicit confidence scores are (simultaneously) low — you ought to find that curious and concerning. Employing an observational method to gauge non-verbal behaviors and facial expressions is one way to tap into implicit confidence. (*If you’re wondering “How might I code facial expressions?”, my recommendation is to conduct a literature review and to concurrently investigate if the “Facial Action Coding System” is suitable for your particular research questions).
  5. Open-ended questions? Open-ended questions (e.g., “What do you think about the overall text legibility of this design?”, “Is there anything you’d like to share about how we might improve legibility?” and “How would you improve this design?”, etc.) often bring to light unique insights. To uncover these findings, responses undergo a thematic analysis. A “thematic analysis” is used “to analyse classifications and present themes (patterns) that relate to the data. It illustrates the data in great detail and deals with diverse subjects via interpretations” (Ibrahim, 2012). For example, you might learn some user interface elements hinder rather than help legibility (e.g., transparent elements). Open-ended questions might be administered in a one-on-one interview or via a paper questionnaire. Regardless, open-ended questions are often best left at the end of the study (I explain why below).
(Diagram 1.0. Simulated heatmap distributions for three 10-foot designs that differ on text size. *Heatmap distributions are magnified for this example.)

Which independent variables do we really want to look at?

Our potential independent variables are:

  • font size
  • case
  • color
  • background and
  • transparency

To prioritize variables accurately, you’ll want to:

  1. Communicate with your team. (e.g., “what are our design constraints?”)
  2. Conduct a benchmark assessment. (e.g., “what are our competitors doing?”) and
  3. Conduct a literature review. (again, “are their any relevant scientific papers on legibility?”)

You might uncover the following:

  • First, your scientific literature review reveals preliminary evidence (e.g., Arditi & Cho, 2007; Vartabedian, 1971) to suggest uppercase letters are more legible than lowercase letters. Make this information explicit to your team.
  • Font-size. A Design Lead tells you: “We want to be on the cutting edge —like Apple — small font sizes look ‘cool’, ‘clean’, ‘less cluttered’ and provide more space for movie titles so they’re not hyphenated. We also don’t want movie posters to be cut-off a screen because of large text. Lowercase letters look ‘refined’, ‘different’ and ‘classy’.” This information is informative. The Lead wants smaller lowercase text but you’re aware (as a result of your literature review) smaller lowercase text is less legible than larger uppercase text (especially 10 ft away from the TV screen). You think, “is there a happy medium?” and conclude that text-size must be experimentally manipulated (i.e., small vs. medium vs. large or, if there are time constraints, small vs. medium or large).
  • Color, background, and transparency. Another designer might tell you: “We are working within design limitations — to keep to the brand, we will only use black text. No other colors are possible. We also won’t be using boxed-designs for characters in the search keyboard and there won’t be any transparent user interface elements.” You cross out color, background and transparency as potential variables.
  • Case. Your benchmark assessment might uncover your key competitors are all using uppercase letters, a finding in opposition to what the Design Lead (and his/her team) are aiming for. Case will have to be experimentally manipulated (i.e., lowercase letters vs. uppercase letters).

As a result of prioritizing key independent variables, your first experimental design might look like this:

This experimental design is referred to as a “2 x 2 factorial design”. There are 2 factors (size & case) with 2 levels each (30 px vs. 23 px & uppercase vs. lowercase). As a result, you have 4 conditions.

If you’re working within low-to-average financial constraints (for a medium-to-large sized company), gathering a total of 100–120 participants for this study may be acceptable (that comes to 25–30 participants per cell).

[*Ideally, you want the same number of participants per cell. Statistically speaking, a two-way ANOVA would be used to uncover statistically significant findings. You might use other statistical analyses such as regression or MANOVA (i.e., if you employ multiple dependent variables).]

Furthermore, a longitudinal component to the research may be necessary. (“The term longitudinal methods represent a research design in which participants are repeatedly assessed over an extended period…” (source).) As you might suspect, upper management often requests highly reliable research findings before heavily investing in a product redesign. As a result, data from a single point in time is often viewed as insufficient, especially for accessibility-related topics.

What’s the research approach in a nutshell?

To ensure collective clarity, a “method” is defined as “a procedure, technique, or way of doing something, especially in accordance with a definite plan” (source).

The methods for this investigation include:

  • An experimental method. To investigate if text size and case (the independent variables) influence text legibility (the dependent variable) an experimental method is used.
  • Quantitative methods. Remember, “when numbers are involved the research involves quantitative methods” (Field, 2009, p. 2). We want to investigate numerical differences between groups so we administer a (closed-ended) legibility ease measure (our primary measure) and a reading speed measure (a potential covariate). These measures are treated as interval scales of measurement (see here for more on how to accurately use Likert scales as interval scales) and, as a result, will be analyzed by descriptive and inferential statistical techniques (more on quantitative methods here).
  • Qualitative methods. Three qualitative methods will be used. First, heatmap distributions might uncover patterns pertaining to visual attention and information hierarchy. Second, to investigate implicit confidence, consenting participants will be video-recorded; their nonverbal body language will be analyzed in line with standard procedures (i.e., Harrigan et al. 2008). To uncover unique insights, a thematic analysis will be conducted on responses to open-ended questions.
  • Longitudinal method. Participants will be assessed again at a future point in time to uncover the robustness of findings from the initial legibility experiment. Designs from the first experiment may be changed in someway for a future experiment (i.e., perhaps the initial experiment reveals text size can be made smaller in some areas; more on this below).

Additionally, there’s a directional element to this research. Based upon the literature review, uppercase letters are more legible than lowercase letters, so we might reasonably expect uppercase letter designs to rank higher on legibility ease than lowercase letter designs.

I’ll now circle back to a point I made earlier. When describing open-ended questions I indicated they’re often best administered at the end of a study.

Why?

First, what would happen to a participant’s behavior if they were asked to verbalize their thoughts before their emotions (or vice versa)? As you might suspect, a measure asking a participant to think will likely influence their subsequent mood and behavior. Technically speaking, this phenomenon is referred to as a “carry-over effect”, which is when “the experience of the first task carries over some mood or state of being to the second task and thus influences the performance” (source). For our purposes, to prevent potential carry-over effects on implicit confidence, open-ended questions are administered at the end.

Before we move on to the next section, I want to digress a moment and talk about a what-if scenario.

What would happen if you don’t prioritize variables by communicating with your team, doing a benchmark assessment and conducting a literature review?

The short answer — you’ll waste valuable resources.

Even if there are no design constraints on, say, color, you might suggest a legibility experiment that your company lacks resources to support (see example below).

This design is complex and your company may not have the resources to support it.

Why might a company not support a 2 x 2 x 2 factorial design?

  • Consider the time of others. Every cell (e.g., “Uppercase Black 30 px”) requires a design mock-up. This would mean a designer would have to allocate time toward designing a mock-up for 8 cells. Your Project Manager might tell you: “Designers are working under many deadlines, they don’t have much time to spare toward this study”.
  • Consider the number of participants you’ll need. Every cell requires a particular number of participants (e.g., many empirical researchers would say: “No less than 30 participants per cell if you’re interested in uncovering statistically significant findings.” Others would argue: “No less than 50.”). This complex study would require 240 (30 x 8) participants (minimum). As a result, even if the experiment is conducted online (which I don’t recommend for accessibility-related topics — more on why below), the data collection phase will run longer than if you conduct a study with 100 or less participants (i.e., the previous example of a 2 x 2).
  • Consider the cost of each participant (and other resources). Every participant is remunerated for their time. Take into account financial constraints before proposing a factorial design such as this one (especially if you’re situated in a small-to-medium sized company).

There are more reasons than these three but I think the message is clear. Prior to proposing research designs, you must take into consideration resource constraints, especially for accessibility-related topics (which often require greater resources than aesthetically-driven research topics).

Let’s shift to key takeaways.

Legibility — what are the key takeaways?

  • It’s going to happen. You can’t avoid the topic of text legibility just because the TV/video-streaming industry is centered in visuals. You’ll either be pulled (as a result of team and/or company interest) or pushed (as a result of accessibility-standards and/or your competitors) toward the topic. Further, don’t sugarcoat the subject matter to your Project Manager: “Legibility and accessibility-related concerns are interesting but complex topics. One study may not be enough. We might find ourselves returning multiple times to this topic throughout the year.
  • It’s going to take more than one study. The topic of legibility is indeed a complex one and one experiment will not be enough. To ensure a holistic understanding of legibility, you’ll have to conduct both qualitative and quantitative studies. To ensure you’re capturing the voice of your user, you’ll have to conduct studies with special groups (e.g., senior viewers and viewers with visual impairments) as well as 20/20-vision viewers. Remember the importance of a representative sample. A “representative sample” is defined as “a sample drawn in an unbiased or random fashion so that it is composed of individuals with characteristics similar to those for whom the test is to be used” (Kaplan et al. 2013, p. 643). Additionally, to discover solutions (to satisfy the peculiarities of your platform) you’ll have to commit more time and financial resources toward this subject matter than other studies (e.g., studies that strictly test different colored user interface elements).
  • It’s a stress-test. An exercise stress-test involves testing your body’s limits. Legibility tests are stress-tests — don’t make it easy. Stand (or sit) all participants 10 feet (or more) away from a TV screen. Why? Consider the following four points (and see Diagram 1.1 below):
  1. Most viewers sit closer than 10 feet to face a TV screen.
  2. As a result, you’ll see a ceiling effect for the ‘legibility ease’ measure (and perhaps other closed-ended measures) if you stand participants too close to the TV screen. Here, “ceiling” is defined as “the highest score possible on a test. When the test is too easy, many people may get the highest score and the test cannot discriminate between the top level performers and those at the lower levels” (Kaplan et al. 2013, p. 640). Said differently, values will be clustered around “extremely easy”, especially if you use a 5-point Likert scale instead of a 10-point Likert scale. Further, consider the trend that TV screen sizes increase every couple of years. You might choose (or be forced to use) a 42 inch screen, but in a couple of years, the standard screen-size may have increased above 48 inches. In other words, with increasing screen sizes, text is more legible today than it was in the past (not to mention other related advancements, such as 4K resolution).
  3. You’ll uncover legibility concerns more readily if you stand participants 10 feet or further from the TV screen (max: 15 feet).
  4. As a result, if you address issues that arise within the ‘stress-test zone’ (see diagram below) you will be more confident in solutions you propose. Take, for example, the following research conclusion:

We’ve tested small, medium and large text — with sample group A (i.e., our low-to-mild visually-impaired group) — at 11 feet away from the TV screen — a distance above the average distance a viewer usually sits from the screen.

Curiously, small text averaged the same (statistically speaking) as medium text on legibility ease and — interestingly, both ranked above the mid-point on the scale. We have options. We can move forward in one of two ways:

  • Option 1. Given all other measures don’t indicate small text to be a problem and given we are under time and resource constraints, we can move forward with the font-sizes for the small text design and be happy with the finding that small text doesn’t vary from medium-text on legibility ease (and that both rank above the mid-point of our primary measure).

OR

  • Option 2. Given all other measures don’t indicate small text to be a problem and we are not under major time and resource constraints than perhaps there is room to make font-sizes for the small text design a bit smaller — that is, if we really want to investigate how far we can go in providing a cleaner and less cluttered user interface in our next product iteration.

So, why do we need a ‘stress-test zone’?

If, say, small text ranks similarly (statistically speaking) to medium text 11 feet away from the screen then we might reasonably expect font-sizes for the small design to be suitable (for the same or similar sample group) in a setting where they’re seated closer to the screen.

This insight also delineates next steps for the design team, “do we want to see how low we can go?”

(Remember, many platforms work within hardware related constraints. Not all are equipped to provide users with accessibility-solutions embedded into their hardware).

(Diagram 1.1. This diagram illustrates the ‘stress-test zone’.)
  • It’s going to require a natural setting. Addressing almost all accessibility-related concerns are difficult to do via an online study. There are many challenges with online studies, not the least of which is careless participant responding. My recommendation is an in-lab study. Simulate a living room environment. Always remove any screen reflections and light-glare. Use incandescent (warm) light instead of fluorescent (cool) light. If possible, dim the lights. Why? Well trained researchers aim for “experimental realism” (rather than “mundane realism”), which is defined as the “degree to which an experiment absorbs and involves its participants” — that is, “experimenters do not want their people consciously play-acting; they want to engage real psychological processes” (Myers et al. 2012, p. 29). As you might suspect, conducting a legibility study in a brightly-lit business environment in, say, a conference room, will undermine your research efforts.
  • It’s a multi-platform concern. Conducting legibility studies for the TV experience is the first step forward. As of 2020, many platforms are now made available on your laptop, tablet and phone. Netflix, for example, provides a Netflix app for iOS and is currently developing the app to be compatible with Android 5.0 (Lollipop). I highly recommend including, in your legibility research report (in a section entitled “future directions”), an aim for future research to involve a multi-platform component (to ensure a holistic understanding of legibility).
  • It’s time to think out-side the box. Some remote controls (e.g., the latest Nvidia Shield remote control) are designed with a voice search feature, which allows viewers to speak into the remote to search video content. Voice search commands may not have been part of your experiment — but there is nothing holding you back from proposing voice search as a future hardware-based solution for groups that have an especially difficult time reading on-screen text.

Simulated data

I’ve simulated data for any reader looking to practice.

The dataset, entitled “legibility_data.csv”, is located on my ‘Open Science Framework’ page (see here).

You’ll find the following three variables in the data file:

  1. “LegibilityEase” (where 0 = “Not at all legible” and 10 = “very legible”)
  2. ReadingSpeed and
  3. “Condition” (where 1 = “Uppercase 30 px” , 2 = “Uppercase 23 px”, 3 = “Lowercase 30 px”, 4 = “Lowercase 23 px”)

Using SPSS and/or R, replicate the results below:

(You can find the full output on my Open Science Framework page.)
(You can find the full output on my Open Science Framework page.)

Let’s now turn to the next theme — the importance of language.

2 — Language matters to users.

I’ll answer:

  1. What’s the challenge?
  2. What’s the research approach?
  3. What are the key takeaways?

What’s the challenge?

Words, phrases and explanations all fall under the umbrella topic of ‘language’, which is, essentially, the way in which an application speaks to a user. In the context of the viewing experience, language is especially important.

To ensure comprehension, I’ll address “what’s the challenge?” in three parts:

  1. Part one. I’ll describe a key use case that’s central to this challenge.
  2. Part two. I’ll describe three pain-points pertaining to language.
  3. Part three. I’ll describe the magnitude of this challenge.

Part one. In the not-so-distant past, when it came to video-streaming applications (e.g., the Netflix app), we didn’t have a variety of options to choose from.

We were simply expected to choose, click and watch.

After some development, we now have an option to ‘download and go’. That is, content is made available for download and offline viewing.

This feature adds flexibility to when and how we consume content — but, perhaps unexpectedly, it added complexity to the viewing experience.

I’ve done both primary and secondary research on this particular area. What have I learned? Today, it proves to be one of the least understandable features for users within a video-streaming application.

Why?

Part two. Take a moment to closely observe the way in which Netflix and Google Play ‘speak to the user’ in the context of the ‘downloads’ section:

Netfix on iOS
Google Play on iOS
  • Observe how Netflix uses the phrase “Delete Download” instead of “Delete”.
  • Observe how Google Play uses the word “Remove” instead of “Delete” or “Delete Download”.
  • Observe how both Netflix and Google Play have a “Cancel” option.
  • Observe how “Play” is absent in Google Play.
  • Finally, consider the following description from Netflix: “spend less time managing and more time watching” (see image #4).

What have you observed?

You might have noticed these applications ‘speak to the user’ slightly differently.

If you thought a bit deeper, you might have also noticed that explaining storage space and clarifying action necessitates an understanding of the following three pain-points:

  1. Semantics. Users might be uncertain about phrases and word choices (e.g., “…wait, what do they mean by ‘remove’ — is that the same as ‘delete’?”).
  2. Automatic processes. Users might be uncertain in trusting an application that automatically deletes content they’ve chosen to download (see Netflix’s ‘Smart Downloads’ in image #4; e.g., “why is it going to delete what I downloaded?”).
  3. A user’s expectations. Users might be uncertain in how much storage space is required (e.g., “how much storage space do I need to download all the “Lord of the Rings” movies?”).

These points underscore the importance of effective and efficient language.

If a user group is thinking “I want to delete my download” more so than “I want to remove my download” the correct approach is to use the word “delete” and/or the phrase “delete download”.

The aim is to use language that reflects the internal state of the user (during a specific moment within their journey in the application).

For example, the word “remove” in the context of the ‘downloads’ section in Netflix might prove problematic (i.e., a user might think: “ — wait, if I ‘remove’ this movie would that be the same as ‘deleting’ it — will I free up space or am I just ‘removing’ it from this section?”).

Part three. I’ve established language might prove a concern in the viewing experience and I’ve outlined a specific case (i.e., the ‘download and go’ section) but I’ve yet to establish the magnitude of this concern.

To frame a discussion of magnitude, we need…

  1. …to define the experience. “How much cognitive load do tasks require?” The TV experience is (primarily) defined as a passive experience (see Meyers & Gerstman, 2001). Said differently, cognitive load on a viewer is low — minimal thinking and problem-solving is required, especially when compared to experiences centered in social networking and (activity-driven) video-gaming. Essentially, an experience might require low, medium or high cognitive load.
  2. …to understand imposed effort on a user. “Overall, how difficult does an application make it for a user to accomplish their aim?” High imposed effort equates to an application that slows down and/or repeatedly blocks a user from achieving their aims. Application shortcomings are usually the cause (e.g., developmental-bugs, poorly designed user interfaces and language that obfuscates rather than clarifies, etc.). Essentially, an application might rank low, medium or high on imposed effort.
  3. …an understanding of the greater context. “How much commitment is going into solving UX pain-points?” Will a team recognize minor and major pain-points and allocate resources toward alleviating pain-points (to ensure a more effective, efficient and satisfying product)? Essentially a team might demonstrate ‘a lot’, a ‘midrange’ or ‘poor’ commitment to solving user pain-points.

These three components all contribute to the magnitude of this challenge. I’ll unpack further by describing possible outcomes in a matrix diagram.

But first (before scrolling down) — take into account these three questions and consider possible outcomes.

Once you’re ready, you’ll find how I’ve mapped out these three questions in the diagram below:

I’d like to draw your attention to the right most top cell (colored red).

  • First, observe how the viewing experience requires “LOW” cognitive load (i.e., when compared to other experiences, like a text-heavy social-networking application, e.g., LinkedIn)
  • Observe how “LOW” compares to “HIGH” on ‘Overall, how difficult does an application make it for a user to accomplish their aim?’ in the row, “LOW” cognitive load.

With respect to imposed effort, low cognitive load tasks will be perceived with more ‘dichotomy’ than high cognitive load tasks.

User groups might say the following:

“A video-streaming application is either great or terrible — there is no in-between.”

Why do you think users might believe this statement and are more likely to say it for video-streaming applications than other application-types?

The short answer? Expectations.

  • Users are expecting ease from a streaming application because they’ve learned the experience of watching requires little effort from them (both cognitively and physically).
  • If a pain-point exists, it may be (erroneously) perceived as a major pain-point, even if, in actuality, it’s a minor pain-point.
  • In this context, language becomes an essential tool. If used incorrectly, language may exasperate an already exasperated problem.

To reiterate, with respect to imposed effort on a user, low cognitive load tasks will be perceived with greater dichotomy than high cognitive load tasks.

What’s the research approach?

Aim. You want to uncover (potential) language concerns for your team’s designs.

Once again, you’ll have to address the following two questions:

  1. What do we really want to measure?
  2. Which independent variables do we really want to look at?

As a reminder, to address both questions accurately, you’ll want to:

  • communicate with your team
  • examine historical data and
  • conduct a literature review

An explanation of this aspect of the research process has been described in the legibility section.

What do we really want to measure?

We want to learn what a user’s cognitive state is as they move about the platform.

To uncover a user’s mental state, we not only want to examine explicit responses but also want to tap into states outside of a user’s awareness. Uncovering implicit affect, cognition and behavior is essential to our research aim (more on why below).

First, let’s review potential explicit measures you might want to consider:

  1. Audio-analytical coding? A think-aloud protocol tasks participants to spontaneously express their (unfiltered) internal thoughts out-loud (i.e., how they feel, think, and expect to behave toward, say, a TV design). To uncover patterns, audio-recorded responses are then subject to a thematic analysis. (Time permitting, audio-recorded responses might be transformed into transcripts prior to a thematic analysis.)
  2. Written-analytical coding? Very direct and specific questions might be asked of users, such as: “Which phrase do you prefer at this stage — ‘remove’, ‘delete’ or ‘delete download’? And why?” Like verbalized expression, written responses also undergo a thematic analysis.

Next, let’s review potential implicit measures:

  1. Reaction time? To use reaction time as an indicator of preference, two separate designs (minimum) and a clickable prototype would be required (i.e., a standard one-way factorial design, sometimes known as an “A/B” experiment, would be used). One design might use the call-to-action word ‘delete’ (design one) while another might use the call-to-action word ‘remove’ (design two). For the experiment, participants might be tasked to delete the same movie. If participants assigned to design one complete the task with greater speed than participants assigned to design two we might suggest ‘delete’ speaks more toward the internal state of the user (at that particular moment) than ‘remove’.
  2. Implicit confidence? Again, this measure might help tap into how participants are really feeling. (See previous section to learn more.)

Which independent variables do we really want to look at?

In this context, ‘language’ (i.e., call-to-action language) is our key independent variable.

In discussing reaction time, I suggested a one-way factorial design, where levels of the independent variable might be different call-to-action words (e.g., ‘delete’ vs. ‘remove’). (Depending on the peculiarities of your platform, you may choose to test different words, phrases and/or explanations.)

You’ll find a diagram outlining this potential experiment below.

(This diagram illustrates potential results from a one-way experiment.)

What results might we uncover?

  • First, we might uncover (task-related) reaction time (i.e., our primary dependent measure) is faster for ‘delete’ than ‘remove’.
  • A thematic analysis might uncover the word ‘delete’ to be more familiar to participants than the word ‘remove’.
  • Lastly, we might uncover both groups to rank similarly on implicit confidence.

These three findings guide us to choose the word ‘delete’ over the word ‘remove’.

If there are no time and resource constraints, our next study might involve an experiment to test ‘delete’ against ‘delete download’ (a phrase that we might hypothesize to rank higher on reaction time than ‘delete’).

What’s the research approach in a nutshell?

The methods for this investigation include:

  • An experimental method. In the context of the ‘download and go’ section, our research question is the following: does language influence the speed with which a user removes downloaded content? To investigate if language (the independent variable, with three levels, i.e., ‘remove’ vs. ‘delete’ vs. ‘delete download’) influences reaction time (the primary dependent variable) an experimental method is used.
  • Quantitative methods. Reaction time (measured in milliseconds and a ratio measurement scale) is employed to investigate preference for call-to-action words (as a reminder, we’re interested in tapping into implicit behavior). A one-way ANOVA is conducted, with reaction time as the primary dependent variable, to uncover if a statistically significant difference exists between groups (more on quantitative methods here).
  • Qualitative methods. Verbalized responses to a think-aloud protocol and written responses to open-ended questions will undergo a thematic analysis. (Given the use of reaction time as an indicator of implicit preference, you don’t need to measure implicit confidence, especially if you’re under time constraints.)
  • Longitudinal method. Once again, participants will be assessed again at a future point in time to uncover the robustness of findings from the initial legibility experiment.

Additionally, there’s a directional element to this research. Familiar call-to-action words (e.g., ‘delete’) are predicted to be more preferred to users than unfamiliar call-to-action words (e.g., ‘remove’).

What are the key takeaways?

  • It’s going to take a bit of perspective taking. I described the ‘download and go’ feature and outlined that the TV experience is made more cumbersome by a feature that initially aimed to give users greater flexibility. Understanding user expectations is essential in a team looking to deliver actionable insights on language. Compared to applications like LinkedIn, users do not expect video-streaming applications to require effort (e.g., “I just want to watch John Wick.”). The watching experience paired with users’ expectations (of that experience) will (naturally) conflict with any added feature that requires greater cognitive load than expected. This is why language becomes a greater challenge in these contexts. Once again, as a general guideline, low cognitive load tasks will be perceived with greater dichotomy than high cognitive load tasks.
  • It’s important to understand moment-by-moment internal states. Language congruent with a user’s internal state is seen as adding to the overall quality of the product. Language not congruent with a user’s internal state is seen as taking away from the overall quality of the product.
  • It’s going to require research versatility. Tapping into implicit behavior (via reaction time) to uncover a user’s mental state is necessary. Explicit measures will not be sufficient in helping you uncover which words, phrases and explanations ‘speak’ more fluidly to users. Like before, I recommend multiple qualitative and quantitative studies.

(As a footnote, I wouldn’t be surprised if in the near future Google Play adopts the word ‘delete’ instead of the word ‘remove’ and perhaps even add the call-to-action word ‘Play’.)

Simulated data

I’ve simulated data for any reader looking to practice.

The dataset, entitled “language_data.csv”, is located on my ‘Open Science Framework’ page (see here).

You’ll find the following two variables in the data file:

  1. “ReactionTime” (in milliseconds)
  2. “Condition” (where 1 = “delete download” , 2 = “delete”, 3 = “remove”)

Using SPSS and/or R, replicate the results below:

(You can find the full output on my Open Science Framework page.)
(You can find the full output on my Open Science Framework page.)

Let’s now turn to the last theme — the importance of familiarity.

3 —Familiarity matters to users.

I’ll answer:

  1. What’s the challenge?
  2. What’s the research approach?
  3. What are the key takeaways?

What’s the challenge?

All product development teams aim to improve their products in some way. Changes may be based in a desire to refine a product’s…

  • software (i.e., persistent developmental bugs)
  • user interface (i.e., size, color and placement of images and text)
  • information architecture (i.e., information structure and placement)
  • navigation (i.e., a user’s journey through an application) and/or
  • ergonomics (i.e., hardware based improvements)

In the previous sections, I discussed changes driven by legibility and language. Regardless of the type of change, almost all product changes are based in a desire to improve the overall user experience. Sometimes, despite a team’s best efforts, product changes are not met with user enthusiasm. Consider the following comments:

Resistance to change, also referred to as ‘change aversion’, is a well-documented challenge for product development teams (see resource material on the subject below).

As you might suspect, resistance varies in intensity. Some products might receive resistance lasting a couple of weeks; other products might receive resistance lasting far longer than two months.

Moreover, some degree of resistance is to be expected (more on why below). Netflix, for example, receives resistance to updates almost every year.

I’ve observed change aversion for nearly all aspects of a TV platform. In conducting both primary and secondary research, however, I learned of an area particularly vulnerable to resistance.

To ensure clarity, I’ll unpack this challenge in parts.

  1. Part one. I’ll describe the challenge.
  2. Part two. I’ll discuss the magnitude of this challenge.
  3. Part three. I’ll discuss pain-points pertaining to a specific use case.

Part one. Have you ever used the Apple TV Siri remote control? If so, would you use it again? What about the latest Nvidia Shield remote control? How do you think these remote controls compare to controls like Xfinity X1 and LG Magic?

Observe the remote controls above.

  • First, observe the number of hard buttons on each remote control.
  • Observe the placement of these hard buttons in relation to the directional pad (or D-Pad; i.e., where ‘up’, ‘down’, ‘right’, ‘left’ navigation arrows are often placed).
  • Observe what’s absent — how might hidden features in the Siri remote control hinder or help efficiency?
  • Observe how the latest Nvidia Shield and LG Magic remotes include a ‘Netflix’ hard button.

Indeed, many challenges exist in designing a TV remote control (e.g., hard-buttons, hidden features, overall feel, etc.).

A greater challenge is ensuring remote control redesigns align with an existing user interface — therein lies our challenge.

Part two. To illustrate the magnitude of this challenge, take a moment to review the table below.

  • Possibility 1 — Changed & Changed. The greatest change aversion occurs when both a TV layout and an accompanying remote control are changed in some way. However, in the industry, this is an uncommon possibility for many reasons. Today, product development teams understand the importance of spacing out product changes.
  • Possibility 4 — Unchanged & Unchanged. In this possibility, a product is left unchanged. As a result, there is no change aversion. Unaddressed pain-points may still exist however.
  • Possibilities 2 & 3 — Unchanged & Changed. To understand possibilities 2 and 3 consider the following example:

You take your car to a mechanic. After waiting a while, your car is returned to you. As a show of customer appreciation, you see that the shop washed off all road grime that accumulated this past month. Your car gleams in the sun. You hop in. As you drive out, you think “the car feels different”. You decide to drive on. You soon notice your steering wheel no longer turns smoothly like it used to. You have to use a lot of force for minimal change. The steering wheel gives you more and more resistance as you shift from lane to lane. You reach an intersection where you’ll need to make a sharp left turn. There is heavy oncoming traffic. When you think it’s safe, you attempt the turn. You used too much force. You ram the curb — you hear your front bumper crack. You swerve to straighten your car. You begin to feel anxious. You look at the road in front of you — it suddenly feels more dangerous.

Without overdoing it, this example is meant to demonstrate a taste of a user’s experience with a changed remote control.

At first, the changed remote control appears aesthetically pleasing. Upon using it, however, you feel you’re unable to navigate a layout like you used to. You end up making mistakes. Suddenly, even the layout feels as if it’s changed in some way (even when it hasn’t), which is, psychologically speaking, a misattribution engendered by increased frustration and anxiety.

In the 10-foot TV experience, a user relies on a remote control to navigate and to do so with ease. This hardware-driven reliance grows over time. Soon, the remote control becomes an extension of a user’s sense of internal control.

Paired with an unchanged remote control, change aversion may still arise as a result of layout-based changes. However, the degree of change aversion will not be as high as the change aversion seen when a remote control is changed (even when paired with an unchanged layout).

Users will not hesitate to abandon a changed remote control.

Consider what happened to Apple TV in 2019. Salt, a telecommunications company, was pushed to redesign Apple’s remote control redesign. Unfortunately for Apple, many users abandoned the Siri remote control. The Salt remote redesign, however, was (and still is) well received by users, despite the fact that it lacks a microphone and a dedicated Siri button.

(If you’re feeling curious, search “Apple TV remote control alternatives”. You won’t just uncover numerous remote control options — you’ll realize there’s an entire sub-industry thriving upon the usability failings of mainstream remote controls, especially remotes paired with a set-top-box.)

I’ll now turn to discuss pain-points pertaining to a specific use case.

Part three. Consider the following scenario:

You’re currently watching a movie (via your set-top-box). You’ve never watched this movie before and you begin to wonder if it’s worth your time — you’re not sure if you want to watch the rest. To get a better sense of what the movie is about, you think it would be a good idea to read the movie description. To get to the description, you’ll have to select the “info” hard button on the remote control , which you’ve learned is faster than navigating to the on-screen “info” button.

At first glance, the task appears quite basic. A user simply has to find and select the “info” hard button. Let’s say the remote control, in the hand of a user (iteration #1), is designed such that the “info” hard button is placed to the very right of the D-Pad (see image to the left, top diagram). Would you say this location is optimal for frequently pressed buttons (like the “info” button)? What do you think would happen if the “info” hard button shifted to the top right of the remote control, next to the power button (see bottom diagram; iteration #2)? Consider what would happen to a user’s speed (i.e., reaction time), a user’s error rate (i.e., pressing the power button instead of the “info” button), and a user’s hand (i.e., physical-strain) if they had to use the second remote instead of the first (in a dimly lit living room).

In researching user behavior in this context, I’ve uncovered the following points:

  • A user will nearly always begin by positioning a remote control so that the D-Pad is placed underneath their thumb. Interestingly, I’ve observed this behavior across a variety of hand sizes and remote control heights. Users have learned the D-Pad is how they’ll immediately engage with user interfaces. It is where they expect to find “up”, “down”, “left” and “right” navigation capabilities — regardless if they’ve been hidden or made visible.
  • Next, there’s an optimal horizontal and vertical range for a user’s thumb (see Diagram 3.1 below). From what I’ve observed, horizontally, any movement between 0 to 50 degrees is optimal — any less or more may strain a user’s thumb (especially with frequent use) and/or might require a remote to be re-positioned (in order for a button to be reached with ease). Vertically, any movement between 0 to 50 degrees is optimal). At this point, you might have made a connection. If not, look up the latest and most popular remote controls on the market. You’ll notice that remote control sizes have gotten smaller over the past few years. What might explain this trend? One reason pertains to a competitive desire to precisely uncover (and design for) this ‘optimal range’.
(Diagram 3.1)

Let’s return to the “info” button example.

In the original design, the “info” button is placed next to the D-Pad. In the redesign, the “info” button is no longer within immediate reach. A user will have to extend not only their thumb but their hand to reach the “info” button. Unlike before, the “info” hard button is no longer easily accessible.

Development teams that decide to re-position frequently pressed buttons (like the “info” button) ought to expect a high degree of change aversion (and once again, more so than if they re-position an “info” button on the user interface).

I’ll now turn to potential research approaches.

What’s the research approach?

Aim. You want to uncover potential concerns pertaining to a remote control redesign.

Once again, you’ll have to begin by addressing the following two overarching questions:

  1. What do we really want to measure?
  2. Which independent variables do we really want to look at?

(*To address both questions accurately, you’ll want to first communicate with your team, examine historical data and conduct a literature review; see the first section on language for an example on how these steps add to the research approach.)

Before I discuss potential dependent and independent variables, I’d like to begin by addressing the following three points:

  • Experimental accuracy. To reiterate “experimental research” is defined as “studies that seek clues to cause-effect relationships by manipulating one or more factors (independent variables) while controlling others (holding them constant)” (Myers et al. 2012, p. 535). In the context of almost all remote-control redesigns, your aim is to investigate cause-effect relationships (i.e., “would shifting the “info” button to the top right of the remote control reduce reaction time and increase the error-rate?”). In order to satisfy the requirements of an experiment, a fully-functioning and UI-compatible remote control redesign is necessary.
  • Cost. The hardware development of a remote control iteration (for experimental testing) is precisely why remote control redesigns are one of the most expensive endeavors in design research. It’s more common for very large companies (e.g., Apple, Google, Amazon, etc.) to develop remote control redesigns for testing and retesting than small, medium or medium-to-large companies. Generally, it’s not uncommon for upper management to reject research proposals that recommend the use of hardware-based redesigns.
  • Alternatives. When funding is not available, small-to-medium sized companies might develop a 3D model of a remote control redesign. However, a 3D model will not meet the requirements of an experiment — without functionality and UI-compatibility direct comparison to an existing remote control is not truly possible. Unfortunately, the research accuracy a team requires (before financially committing to a final redesign, i.e., for mass production) will not be present in these cases.

For the sake of experimental research accuracy, a functioning remote control redesign (instead of a 3D model) is used for the research portion of this section.

With that said, let’s move on.

What do we really want to measure?

  1. Reaction time? When remote control hard buttons are removed and/or re-positioned (like the “info” hard button) reaction time becomes an essential measure. Like the previous section, an experiment would be required. One group of participants would be assigned to a remote control redesign and a second group of different participants would be assigned to the current remote control. Participants would be tasked to navigate a layout using only the remote control they’ve been given. (To standardize the experiment, participants would be instructed not to navigate the layout via user interface buttons.)
  2. Implicit emotion? Like the previous two sections, tapping into implicit emotion will help researchers understand how participants are really feeling at specific moments during the study. Using an 11- point scale (where 0 = “Not at all” and 10 = “Extremely”) a researcher might answer the question “Overall, has a participant demonstrated any non-verbal indications of frustration?” If there are four task scenarios (to test 4 key use cases), this question would be administered four times.
  3. Written-analytical coding? Like the previous section, very direct and specific questions might be asked of participants, such as: “Which aspects of the remote control do you dislike and why?”
  4. Differential exposure? A unique challenge for any line of research examining remote control redesigns is differential exposure, that is, “How frequently do you use this remote control?”. You may use a 4-point scale, for example, “once a week”, “about 2–3 times a week”, “about 4–5 times a week” and “more than 6 times a week”. Alternatively, you may use a 0–10 point scale where 10 equals “more than 10 times a week”. A user that uses a remote control several times a week is likely more experienced with the remote control than a user who uses the remote once a week. As a response to this variation, researchers may be tempted to screen-out participants who haven’t had moderate to high exposure. Doing so, however, will mean a significant portion of users are excluded. Inexperienced users provide unique novice-related insights that experienced users don’t. Whenever possible, aim for a representative sample (even if it means incorporating less experienced users). To take into account this in-group variation, researchers have the option to incorporate a ‘differential exposure’ score as a covariate. At a more basic qualitatively-driven level, researchers might also tie in observational notes to see how experience impacts participant-performance and other dependent variables.

Additionally, to uncover how launched products are faring, you’ll want to measure change aversion over time.

Change aversion. To uncover change aversion trends, a design team must engage in acute social listening (more on this below). After a product launch, almost immediately, users may express their unfiltered thoughts on various media outlets (e.g., Twitter, Facebook, Instagram, etc.).

Which independent variables do we really want to look at?

In the industry, you might be required to research multiple factors simultaneously. Here are some examples:

  1. Button removal (i.e., permanent removal of hard buttons and their corresponding user interface buttons; “Do we want to permanently remove the “last” button?”)
  2. Hidden features (i.e., hidden vs. non-hidden features; “Since we’ve integrated voice command, do we want to make the “info” button a hidden feature?”)
  3. Positioning (i.e., re-positioned hard buttons; “Do we want to shift the “info” button to the top right of the remote control?”)
  4. Light (i.e., button back-light vs. no back-light; “Since some viewers watch in dimly lit environments, do we want to integrate sensory back-light, such that when the remote control is picked up key buttons are back-lit?”)*
  5. Color (e.g., white vs. grey vs. black)
  6. Weight (i.e., the overall weight of a remote control; e.g., light but fragile material vs. heavy but durable material)
  7. Texture (e.g., silicone rubber buttons vs. hard plastic)

These potential factors are not mutually exclusive.

*For example, back-lit buttons require a stronger battery than a remote control without this feature. As a result of an internal and rechargeable battery, however, a remote control will be heavier, which will be problematic for users (i.e., “after holding this remote control for 10 minutes, my wrist is beginning to feel strained…”; this is especially true for senior users).

This list of potential factors is not an exhaustive one. There are various dimensions you might encounter not listed above (see resource material below for further information).

To ground our discussion, let’s consider “light” as our independent variable and let’s consider a total of three levels, which are:

  • no back-light vs.
  • back-light for all hard buttons vs.
  • back-light for only the navigation buttons (see Diagram 3.2 below).

As you might correctly conclude, these remote control changes are minimal when compared to industry-level redesigns.

I’ve purposefully simplified this example.

After walking through this example, I’ll return to change aversion—the key subject of this section.

(Diagram 3.2; Remote One vs. Remote Two vs. Remote Three)

What results might we uncover from such an experiment?

  • In a study where participants are situated in a dimly lit environment, we might uncover (task-related) reaction time (i.e., our primary dependent measure) to be faster for participants in Group 2 (where all the remote buttons are back-lit) than Group 1 (where all the remote buttons are not back-lit) or Group 3 (where only the remote navigation buttons are back-lit).
  • For the question, “Has a participant demonstrated any non-verbal indications of frustration?” we may uncover Group 1 to rank highest when compared to the other two groups.
  • Further, a thematic analysis on participant written responses might uncover a strong preference for the second remote control. However, we might encounter mixed responses, consider the following potential comments regarding the the second remote control: “this remote feels a bit heavy, maybe make it lighter”, “the back-lights turn on as soon as I pick up the remote control and they turn off after a few seconds — is there anyway I can have the back-lights stay turned on?”, “I can see how the back-lights might be helpful in some cases, but the light can be distracting in the same way a phone light is distracting in a movie theater.” Don’t water down or dismiss counter-comments, even if your overall research findings suggests back-lit buttons are more effective than non-back-lit buttons.

What’s the research approach in a nutshell?

The methods for this investigation include:

  • An experimental method. Our research question is the following: does back-lighting all remote control buttons decrease (rather than increase) reaction time when compared to partial back-lighting (only the navigation buttons) or no back-lighting at all? To investigate if light (the independent variable) influences reaction time (the primary dependent variable) an experimental method is used.
  • Quantitative methods. Reaction time (measured in milliseconds) is employed as our primary dependent variable. Implicit emotion and ‘differential exposure’ are both operationalized using an 11-point scale (see why here: source). Statistical operations would be applied to these three quantitative measures (more on quantitative methods here).
  • Qualitative methods. To uncover patterns, written responses to open-ended questions undergo a thematic analysis.
  • Longitudinal method. It goes without saying, the example above involves minimal remote control changes. In the industry, hardware updates involve multiple factors. What’s important is the longitudinal component. To address potential change aversion, social listening assessments are required. If a secondary study reveals similar findings to the first (in favor of the redesign) and a social listening assessment simultaneously uncovers minimal negative feedback on the redesign, there is insufficient cause to worry. If a social listening assessment uncovers tremendous negative feedback that increases (rather than decreases) with time, redesign changes must be readdressed (even if a second or third study reveal positive findings in favor of the redesign).

What are the key takeaways?

  • It’s going to take a bit of global and business-oriented perspective-taking. With respect to change aversion, keep things into perspective. Rather than considering if the design has been a success, consider if the product has been a success. If overall, an assessment reveals that the design changes made add value to the product, wait it out — resist the temptation to back-paddle.
  • It’s going to take some empathy-driven perspective-taking. I’ve repeatedly observed a rather curious behavior: users will choose familiar options over optimal ones. Why? Here are some possible causes:
  1. Immediate goals. “I’m just here to lay back and watch a movie.”
  2. Ignorance. “I didn’t know I could do that.”
  3. Perceived time. “I simply don’t have time to learn these new features.”
  4. Perceived energy. “I simply don’t have the energy to learn these new features.”
  5. Perceived magnitude. “There is too much to learn.”
  6. Perceived worth. “I’m still getting from A to B, so, what difference does learning these new features really bring?”
  7. Expectations. “I know what I’ll get when I do what I’ve always done.”
  8. Ambiguity intolerance. “I can’t stand not knowing.”

Recall my discussion on cognitive load (from the previous section on language). To reiterate, the viewing experience requires low cognitive load. In the context of the viewing experience, “effort” is not expected by a user. A user encountering any degree of necessary effort may erroneously perceive more effort is required than in actuality. As a result, users are more susceptible to the familiarity bias in the context of the viewing experience than in other experiences (i.e., experiences centered in a greater degree of problem-solving, e.g., social-networking and video-gaming). In this context, change aversion may be evident almost immediately.

Furthermore, research psychologist have long understood the influence of the “primacy effect” on affect (i.e., how we feel), cognition (i.e., how we think) and behavior. The “primacy effect” is defined as “other things being equal, information presented first usually has the most influence” (Myers et al. 2012, p. 537). Interestingly, the primacy effect is possible for all types of information — be it visual or textual information. In investigating change aversion, we must understand users are not immune to, and may be more susceptible to, the primacy effect. They may compare any subsequent iterations to the first-ever iteration, and (sometimes erroneously) conclude any subsequent iterations to be poorer in quality.

  • It’s going to take some time. Generally speaking, humans have a strong tendency to be poor emotional, cognitive and behavioral self-forecasters. We may underestimate, for example, our ability to overcome negative events. This phenomenon is sometimes referred to as “immune neglect”. “Immune neglect” is defined as “the human tendency to underestimate the speed and the strength of the ‘psychological immune system’, which enables emotional recovery and resilience after bad things happen” (Myers et al. 2012, p. 536). With respect to change aversion, users may underestimate the psychological speed and strength enabling recovery after changes.

Simulated data

I’ve simulated data for any reader looking to practice.

You’ll find two datasets; both are located on my ‘Open Science Framework’ page (see here).

  1. Dataset #1 is for a remote control redesign experiment and
  2. Dataset #2 is for a post-product launch ‘change aversion’ assessment

Dataset #1 is entitled “familiarity_data_ONE.csv.”

You’ll find the following variables in the data file:

  1. “Condition” (where 1 = “all buttons are back-lit”, 2 = “only navigation buttons are back-lit” and 3 = “no back-lighting”).
  2. “ReactionTime” (in milliseconds; there are a total of 4 study tasks).
  3. “ImplicitEmotion” (there are four implicit emotion scores, i.e., one for each task; see description above on “Implicit Emotion” for more information).
  4. “DifferenetialExposure” (our potential covariate).

Dataset #2 is entitled “familiarity_data_TWO.csv.”

Change aversion assessments often involve a thematic analysis of user comments (a thematic analysis is pivotal in uncovering root causes of user resistance).

However, quantitative approaches are also possible.

How?

For this section, pretend you’ve extracted user comments from Twitter and have ‘mass’ processed them via a linguistic analysis tool, namely LIWC, at three time-points (one, three and five months after a (redesign) product launch).

  1. Time 1 (one month after product launch)
  2. Time 2 (three months after product launch)
  3. Time 3 (five months after product launch)

For our purposes, we’re only interested in the “Negative Emotion” LIWC dimension (i.e., “negemo”).

Each time-point captures “Negative Emotion” from Twitter comments.

Percentage of total words. Most of the LIWC output variables are percentages of total words within a text. For example, imagine you have analyzed a blog and discover that the Positive Emotions (or posemo) number was 4.20. That means that 4.20 percent of all the words in the blog were positive emotion words.” (Source)

You’ll find a total of 1414 twitter comments for each time-point.

Using SPSS and/or R, replicate the results below:

For Dataset #1, entitled “familiarity_data_ONE.csv.”

(You can find the full output on my Open Science Framework page.)
  • A question to consider: Why do you think reaction time is decreasing over time (i.e., getting faster)?

For Dataset #2, entitled “familiarity_data_TWO.csv.”

(You can find the full output on my Open Science Framework page.)
  • A question to consider: After reviewing the ‘change aversion’ trend line, which recommendations would you make to your team?

A final note

I began this article by stating: “If you’re about to step into this industry as a researcher — you’re in for an adventure.”

I truly hope the information I’ve described here proves informative to any and all researchers entering this ever-changing field.

I’ve listed educational material (textbooks, articles, courses, etc.) and references below.

If you try out the datasets and run into any issues feel free to reach out.

About the author:

Hajer Al Homedawy received her Bachelor of Arts in Psychology (Honors) from the University of Waterloo, her Master of Arts in Social Psychology from Wilfrid Laurier University and her Master of Digital Experience Innovation (MDEI) from the University of Waterloo.

Educational material:

  • Andy Field — Discovering Statistics Using IBM SPSS Statistics (textbook)
  • James R. Lewis and Jeff Sauro — Quantifying the User Experience: Practical Statistics for User Research (textbook)

LinkedIn Learning:

  1. SPSS Statistics Essential Training — Barton Poulson (course)
  2. UX Foundations: Research — Amanda Stockwell (course)
  3. Empathy in UX Design — Cory Lebson (course)

Specific Topics:

  • Roxanne Abercrombie — Change Aversion And The Conflicted User (article)
  • Television Industry — Science Direct (see here; multiple chapters & articles)

References:

Alhojailan, M. I. (2012). Thematic analysis: A critical review of its process and evaluation. West East Journal of Social Sciences, 1(1), 39–47.

Braun, V., & Clarke, V. (2006). Using thematic analysis in psychology. Qualitative research in psychology, 3(2), 77–101.

Harrigan, J., Rosenthal, R., Scherer, K. R., & Scherer, K. (Eds.). (2008). New handbook of methods in nonverbal behavior research. Oxford University Press.

Kaplan, M. R., & Saccuzzo P. D (2013). Psychological Testing: Principles, applications, and Issues, 8th Edition.

Menard, S. (2002). Longitudinal research (Vol. 76). Sage.

Meyers, H., & Gerstman, R. (2001). Interfacing with the Consumer. In branding@thedigitalage (pp. 116–129). Palgrave Macmillan, London.

Myers, D. G., Spencer, S. J., & Jordan, C. (2012). Social psychology: Canadian edition.

Stebbins, R. A. (2001). Exploratory research in the social sciences (Vol. 48). Sage.

--

--