RS Labs Tech

Female, Male, or Neutral? Filtering Based on Gender

Screen Shot 2017-08-22 at 12.28.58 PM


In predictive suggestion techniques, typically occasions will happen that seem counterintuitive. As a marketer, you might discover some male customers in your buyer record receiving suggestions for some ladies’s skincare merchandise. Typically, that is just because these male customers have been looking for their mother, spouse, girlfriend, daughter, and so on, and their shopping historical past indicated this. Different occasions, these male customers might have bought a well being meals product with some correlation to these feminine skincare merchandise, and the fashions picked up on these similarities. This isn’t essentially dangerous, as really predictive techniques will typically decide up hidden alerts which are unattainable to uncover with generic advertising instruments.

Nevertheless, some companies could have particular causes to not present feminine gadgets to male customers, and vice versa. And this can be a good approach for companies so as to add their very own area information to enrich and enhance a suggestion system. To unravel this use case, we carried out a gender-match filter in Redicto. This weblog describes the strategy and particulars a current enchancment to our gender-tagging course of.

Feminine, Male, or Impartial?

Our gender-match filter requires that we first tag customers, with our greatest guess at their gender choice, and gadgets, with our greatest guess at their gender-specificity. For each, we assign both Male, Feminine, or Impartial.

Inferring Consumer Genders

For a few of our shoppers, customers can present their gender throughout account signup, however this info is usually omitted. Then again, a consumer’s first identify is often a required area throughout account signup, and so is most frequently current. In these instances, we will infer their gender utilizing inhabitants statistics. There are numerous public datasets which may inform us of the gender distribution for widespread names: for instance, the US Social Safety Administration maintains actuarial tables with inhabitants counts by identify, delivery yr, and gender (5 Thirty Eight has a pleasant rationalization of how this dataset can be utilized to deduce age.)

Inferring Merchandise Genders

The second step is to deduce the gender-specificity of the gadgets. The primary means we did that is utilizing pure language processing (learn: counting key phrases). By taking a look at product names and different metadata from quite a lot of sources, we now have constructed up a set of key phrases (see Fig 1.1) that are extremely indicative of male-specific and female-specific gadgets. If a product incorporates a number of of those key phrases, and they don’t seem to be contradictory, then we will tag that product as for Males or Females. In any other case, as with ambiguous customers, we tag ambiguous merchandise as Impartial.

Fig 1.1 Pattern of prime key phrases for merchandise gender tagging, decided from pre-tagged sources of gender-targeted product names

Going Additional

Lately, we observed a problem with a clothes e-commerce retailer the place our keyword-based merchandise gender tagging was not adequate. They’ve each males’s and ladies’s clothes, however these classes weren’t given in our knowledge so we have been inferring gender principally from the merchandise names. For some gadgets (clothes, ties, and so on) this labored advantageous, however for others there have been no gender-specific clues. The truth is, for some gadgets, the names have been precisely the identical—similar to a generic “White V-Neck Tee” which got here in each a males’s and ladies’s variant.

To unravel this drawback, we seemed on the genders of the customers who bought these things. True, in some instances customers might buy gadgets which aren’t marketed for his or her gender—for instance, they could be shopping for a present, or be utilizing a companion’s account, or just like a product whatever the producer’s meant viewers. Nevertheless, on the entire, the numbers match our expectations. For the ladies’s v-neck tee, 91% of consumers with recognized (or inferred) gender have been feminine. For the lads’s v-neck tee, 90% have been male.

Utilizing this type of data-driven strategy leads to far more correct tagging in comparison with making assumptions based mostly on metadata. The first disadvantage of this technique is dealing with new merchandise, which gained’t have generated sufficient purchases to find out whether or not a gender-specificity exists. A secondary situation is dealing with shoppers with imbalanced customers bases (skewed towards feminine or male customers). Our strategy needed to deal with these instances in a common means throughout a number of e-commerce shoppers.

Making a Gender Bias Rating

Our implementation goals to generate a single gender bias rating for every product which can be utilized by our gender-match suggestion filter.

A naive strategy to formulate a Gender Rating S might be:

S = (M – F) / (M + F)
the place
F is the variety of feminine customers who’ve affinity to the merchandise, and
M is the variety of male customers who’ve affinity to the merchandise

The rating ranges from -1 to +1, the place -1 means 100% of the consumers have been feminine, and +1 means 100% have been male. If an equal variety of men and women purchased the merchandise, that suggests a rating of zero.

After putting merchandise on this scale, we will select an appropriate threshold to acquire our gender match filter.

For instance, if a shopper needs a robust filter, we will set a threshold of zero.9: if S zero.9, then it’s male-specific; in any other case it’s impartial.

Screen Shot 2017-08-22 at 12.29.22 PM

Fig 1.1. Examples of feminine, male and impartial gadgets together with the gender rating

Smoothing with a Pseudo-Rely

Sadly, this rating can be very noisy for gadgets with only a few purchases. If there’s a new merchandise with just one buy by a feminine consumer, then it might get a rating of -1, however we definitely don’t have sufficient proof to imagine it’s a female-specific merchandise. This can be a good use case for a pseudo-count. We will fake that each one gadgets have been purchased, say, 10 occasions—by 5 male and 5 feminine customers—which can give the scores some inertia that needs to be disproven by proof:

S = (M – F) / (M + F + Pseudo-count)

Growing the pseudo-count drives scores towards zero, so selecting a worth is determined by the general consumer inhabitants and what a “typical” product’s buy rely is:

Screen Shot 2017-08-22 at 12.29.43 PM

The distribution of the Merchandise Gender Rating for various decisions of pseudo-counts

Accounting for an Unbalanced Consumer Base

This may work if a shopper has comparable numbers of feminine and male customers. But when a website has 10000 feminine and 50 male customers, then the scores will all be closely skewed towards feminine customers. Put one other approach, if an merchandise is bought by 50 ladies and 50 males, meaning it was bought by zero.5% of the ladies however 100% of the lads, and could be very probably a male-specific merchandise. On this case, we will add a scale issue to offer every male extra weight:

S = (A * M – F) / (A * M + F + Pseudo-count)

The place A is that scaling issue, calculated because the ratio of feminine customers to male customers in your complete consumer base. In our instance above, A can be 10000 / 50 = 200.

Within the case of our clothes retailer, the consumer base occurred to be fairly balanced. The scaling issue was A = 80% (about four ladies for each 5 males). This resulted in shifting the scores very barely to the left:

Screen Shot 2017-08-22 at 12.28.32 PM

For this shopper, including the stability issue to provide female and male customers equal complete weight shifted the scores to the left.

Why a Filter?

It will be affordable to ask: why can’t the advice mannequin simply care for the filtering itself? There are lots of algorithms that may incorporate attributes a few consumer, resembling their gender, in making predictions. Ideally these ought to study that male customers have a tendency to not purchase female-use gadgets, and vice-versa. Nevertheless, this doesn’t give us any management over how conservative or liberal the mannequin is relating to gender matching. Furthermore, we would like the liberty to decide on totally different fashions for various use instances. For instance, do the suggestions have to be made as actual time responses to onsite actions, or can they watch for batch processing? Does the shopper need to emphasize merchandise categorizations, or user-specific conduct? As such, most of the fashions we use don’t explicitly use the consumer attributes. Including gender-based filtering as a post-processing step will get previous this hurdle and provides us fine-grained management to make sure that all or any of our suggestions keep away from shocking customers with gadgets that don’t match their gender.

Utilization and Subsequent Steps

This may solely be helpful for shoppers which have merchandise which might be meant for each female and male consumers. If a shopper’s merchandise are focused solely at females, then any male customers would in fact be utilizing the location to buy these gadgets and wouldn’t want their outcomes filtered. However in that case, we should always see only a few gadgets with gender skews anyway.

We run this gender-match filter by default for all of our shoppers. Usually, we see 2-10% of our product suggestions eliminated by this filter, permitting different, extra related gadgets to be advisable as an alternative.

For the subsequent step on this function, we will use the identical technique to enhance our inferences concerning the gender preferences of customers. That’s, we will take a look at whether or not customers buy male-specific or female-specific gadgets, fairly than simply assuming their gender based mostly on their identify. This raises an fascinating drawback as a result of the merchandise gender tagging is determined by the consumer gender tagging, and including this step would introduce a cyclical dependency. An iterative strategy like Expectation-Maximization ought to work nicely.


Ensuring a consumer’s suggestions are applicable for his or her gender requires understanding their gender choice in addition to the gender skew of all merchandise within the catalogue. Within the absence of specific metadata, we will infer them. Textual evaluation is a handy and dependable technique for doing so, however it makes many assumptions that will not be correct for particular person customers or merchandise. Right here we’ve proven some particulars into how we carry out this for figuring out gender-specific gadgets, permitting our fashions the pliability to help enterprise particular use instances.

Concerning the Authors

Eric Doi is a knowledge scientist at Retention Science.  His objective is to enhance each day, identical to gradient boosted learners.  He studied Pc Science at UC San Diego and Harvey Mudd School.

Kai Wang is a knowledge scientist at Retention Science.

Vedant Dhandhania is a Machine Studying Engineer at Retention Science. He helps predict buyer conduct utilizing superior machine studying algorithms. His ardour lies at the intersection of Sign Processing and Deep Studying.




(perform(d, s, id)
var js, fjs = d.getElementsByTagName(s)[0];
if (d.getElementById(id)) return;
js = d.createElement(s); = id;
js.src = “//join.fb.internet/en_US/sdk.js#xfbml=1&model=v2.5”;
fjs.parentNode.insertBefore(js, fjs);
(doc, ‘script’, ‘facebook-jssdk’));
(perform(d, s, id)
var js, fjs = d.getElementsByTagName(s)[0];
if (d.getElementById(id)) return;
js = d.createElement(s); = id;
js.src = “//join.fb.internet/en_US/sdk.js#xfbml=1&appId=1425108201100352&”;
fjs.parentNode.insertBefore(js, fjs);
(doc, ‘script’, ‘facebook-jssdk’));