Introduction

Machine Learning is tricky. No matter how many books you read, tutorials you finish or problems you solve, there will always be a data set you might come across where you get clueless. Specially, when you are in your early days of Machine Learning. Isn’t it ?

In this blog post, you’ll learn some essential tips on building machine learning models which most people learn with experience.These tips were shared by Marios Michailidis(a.k.a Kazanova), Kaggle Grandmaster, Current Rank #3 in a webinar happened on 5th March 2016. The webinar had three aspects:

Video – Watch Here.
Slides – Slides used in the video were shared by Marios. Indeed, an enriching compilation of machine learning knowledge. Below are the slides.
Q & As – This blog enlists all the questions asked by participants at webinar.

The key to succeeding in competitions is perseverance. Marios said, ‘I won my first competition (Acquired valued shoppers challenge) and entered kaggle’s top 20 after a year of continued participation on 4 GB RAM laptop (i3)’.Were you planning to give up ?

While reading Q & As, if you have any questions, please feel free to drop them in comments!

Questions & Answers

1. What are the steps you follow for solving a ML problem? Please describe from scratch.

Following are the steps I undertake while solving any ML problem:

Understand the data – After you download the data, start exploring features. Look at data types. Check variable classes. Create some univariate – bivariate plots to understand the nature of variables.
Understand the metric to optimize – Every problem comes with a unique evaluation metric. It’s imperative for you to understand it, specially how does it change with target variable.
Decide cross validation strategy – To avoid overfitting, make sure you’ve set up a cross validation strategy in early stages. A nice CV strategy willhelp you get reliable score on leaderboard.
Start hyper parameter tuning– Once CV is at place, try improving model’s accuracy using hyper parameter tuning. It further includes the following steps:
- Data transformations: It involve steps like scaling, removing outliers, treating null values, transform categorical variables, do feature selections, create interactions etc.
- Choosing algorithms and tuning their hyper parameters: Try multiple algorithms to understand how model performance changes.
- Saving results: From all the models trained above, make sure you save their predictions. They will be useful for ensembling.
- Combining models: At last, ensemble the models, possibly on multiple levels. Make sure the models are correlated for best results.

2. What are the model selection and data manipulation techniques you follow to solve a problem?

Generally, I try (almost) everything for most problems. In principle for:

Time series: I use GARCH, ARCH, regression, ARIMA models etc.
Image classification: I use deep learning (convolutional nets) in python.
Sound Classification :Common neural networks
High cardinality categorical (like text data): I use linear models, FTRL, Vowpal wabbit, LibFFM, libFM, SVD etc.

For everything else,I use Gradient boosting machines (like XGBoost and LightGBM) and deep learning (like keras, Lasagne, caffe, Cxxnet). I decide what model to keep/drop in Meta modelling with feature selection techniques.Some of the feature selection techniques I use includes:

Forward (cv or not) – Start from null model. Add one feature at a time and check CV accuracy. If it improves keep the variable, else discard.
Backward (cv or not) – Start from full model and remove variables one by one. It CV accuracy improves by removing any variable, discard it.
Mixed (or stepwise) – Use a mix of above to techniques.
Permutations
Using feature importance – Use random forest, gbm, xgboost feature selection feature.
Apply some stats’ logic such as chi-square test, anova.

Data manipulation could be different for every problem :

Time series : You can calculate moving averages, derivatives. Remove outliers.
Text : Useful techniques are tfidf, countvectorizers, word2vec, svd (dimensionality reduction). Stemming, spell checking, sparse matrices, likelihood encoding, one hot encoding (or dummies), hashing.
Image classification: Here you can do scaling, resizing, removing noise (smoothening), annotating etc
Sounds : Calculate Furrier Transforms , MFCC (Mel frequency cepstral coefficients), Low pass filters etc
Everything else : Univariate feature transformations (like log +1 for numerical data), feature selections, treating null values, removing outliers, converting categorical variables to numeric.

3. Can you elaborate cross validation strategy?

Cross validation means that from my main set, I create RANDOMLY 2 sets. I built (train) my algorithm with the first one (let’s call it training set) and score the other (let’s call it validation set). I repeat this process multiple times and always check how my model performs on the test set in respect to the metric I want to optimize.

The process may look like:

For 10 (you choose how many X) times
Split the set in training (50%-90% of the original data)
And validation (50%-10% of the original data)
Then fit the algorithm on the training set
Score the validation set.
Save the result of that scoring in respect to the chosen metric.
Calculate the average of these 10 (X) times. That how much you expect this score in real life and is generally a good estimate.
Remember to use a SEED to be able to replicate these X splits

Other things to consider is Kfold and stratified KFold . Read here.For time sensitive data, make certain you always the rule of having past predicting future when testing’s.

4. Can you please explain sometechniques usedfor cross validation?

Kfold
Stratified Kfold
Random X% split
Time based split
For large data, just one validation set could suffice (like 20% of the data – you don’t need to do multiple times).

5. How did you improve your skills in machine learning? What training strategy did you use?

I did a mix of stuff in 2. Plus a lot of self-research. Alongside,programming and software (in java) and A LOT of Kaggling ☺

6. Which are the most useful python libraries for a data scientist ?

Below are some libraries which I find most useful in solving problems:

Data Manipulation
- Numpy
- Scipy
- Pandas
Data Visualization
- Matplotlib
Machine Learning / Deep Learning
- Xgboost
- Keras
- Nolearn
- Gensim
- Scikit image
Natural Language Processing
- NLTK

7. What are useful ML techniques / strategies to impute missing values or predict categorical label when all the variables are categorical in nature.

Imputing missing values is a critical step. Sometimes you may find a trend in missing values. Below are some techniques I use:

Use mean, mode, median for imputation
Use a value outside the range of the normal values for a variable. like -1 ,or -9999 etc.
Replace witha likelihood – e.g. something that relates to the target variable.
Replace with something which makes sense. For example: sometimes null may mean zero
- Try to predict missing values based on subsets of know values
- You may consider removing rows with many null values

8. Can you elaborate what kind of hardware investment you have done i.e. your own PC/GPU setup for Deep learning related tasks? Or were you using more cloud based GPU services?

I won my first competition (Acquired valued shoppers challenge) and entered kaggle’s top 20 after a year of continued participation on 4 GB RAM laptop (i3). I was using mostly self-made solutions up to this point (in Java). That competition it had something like 300,000,000 rows of data of transactions you had to aggregate so I had to parse the data and be smart to keep memory usage at a minimum.

However since then I made some good investments to become Rank #1. Now, I have access to linux servers of 32 cores and 256 GBM of RAM. I also have a geforce 670 machine (for deep learning /gpu tasks) . Also, I use mostly Python now. You can consider Amazon’s AWS too, however this is mostly if you are really interested in getting to the top, because the cost may be high if you use it a lot.

9. Do you use high performing machine like GPU. or for example do you do thing like grid search for parameters for random forest(say), which takes lot of time, so which machine do you use?

I use GPUs (not very fast, like a geforce 670) for every deep learning training model. I have to state that for deep learning GPU is a MUST. Training neural nets on CPUs takes ages, while a mediocre GPU can make a simple nn (e.g deep learning) 50-70 times faster. I don’t like grid search. I do this fairly manually. I think in the beginning it might be slow, but after a while you can get to decent solutions with the first set of parameters! That is because you can sort of learn which parameters are best for each problem and you get to know the algorithms better this way.

10. How do people built around 80+ models is it by changing the hyper parameter tuning ?

It takes time. Some people do it differently. I have some sets of params that worked in the past and I initialize with these values and then I start adjusting them based on the problem at hand. Obviously you need to forcefully explore more areas (of hyper params in order to know how they work) and enrich this bank of past successful hyper parameter combinations for each model. You should consider what others are doing too. There is NO only 1 optimal set of hyper params. It is possible you get a similar score with a completely different set of params than the one you have.

11. How does one improve their kaggle rank? Sometimes I feel hopeless while working on any competition.

It’s not an overnight process. Improvement on kaggle or anywhere happens with time. There are no shortcuts. You need to just keep doing things. Below are some of the my recommendations:

Learn better programming: Learn python if you know R.
Keep learning tools (listed below)
Read some books.
Play in ‘knowledge’ competitions
See what the others are doing in kernels or in past competitions look for the ‘winning solution sections’
Team up with more experience users, but you need to improve your ranking slightly before this happens
Create a code bank
Play … a lot!

12. Can you tellus about some usefultools used in machine learning ?

Below is the list of my favourite tools:

Liblinear : For linear models
LibSvm for Support Vector machines
Scikit Learn for all machine learning models
Xgboost for fast scalable gradient boosting
LightGBM
Vowpal Wabbit for fast memory efficient linear models
http://www.heatonresearch.com/encog encog for neural nets
H2O in R for many models
LibFm
LibFFM
Weka in Java (has everything)
Graphchi for factorizations
GraphLab for lots of stuff
Cxxnet : One of the best implementation of convolutional neural nets out there. Difficult to install and requires GPU with NVDIA Graphics card.
RankLib: The best library out there made in java suited for ranking algorithms (e.g. rank products for customers) that supports optimization fucntions like NDCG.
Kerasand Lasagnefor neural nets. This assumes you have Theanoor Tensorflow.

13. How to start with machine learning?

I like these slides from the university of utah in terms of understanding some basic algorithms and concepts about machine learning. This book for python. I like this book too. Don’t forget to follow the wonderful scikit learn documentation. Use jupyter notebook from anaconda.

You can find many good links that have helped me in kaggle here. Look at ‘How Did you Get Better at Kaggle’

In addition, you should do Andrew Ng’s machine learning course. Alongside, you can follow some good blogs such as mlwave, fastml, analyticsvidhya. But the best way is to get your hands dirty. do some kaggle! tackle competitions that have the “knowledge” flag first and then start tackling some of the main ones. Try to tackle some older ones too.

14. What techniques perform best on large data sets on Kaggle and in general ? How to tackle memory issues ?

Big data sets with high cardinality can be tackled well with linearmodels. Consider sparse models. Tools like vowpal wabbit. FTRL , libfm, libffm, liblinear are good tools matrices in python (things like csr matrices). Consider ensembling (like combining) models trained on smaller parts of the data.

15. What is the SDLC (Sofware Development Life Cycle) of projects involving Machine Learning ?

Give a walk-through on an industrial project and steps involved, so that we can get an idea how they are used. Basically, I am in learning phase and would expect to get an industry level exposure.
Business questions: How to recommend products online to increase purchases.
Translate this into an ml problem. Try to predict what the customer will buy in the future given some data available at the time the customer is likely to make the click/purchase, given some historical exposures to recommendations
Establish a test /validation framework.
Find best solutions to predict best what customer chose.
Consider time/cost efficiency as well as performance
Export model parameters/pipeline settings
Apply these in an online environment. Expose some customers but NOT all. Keep test and control groups
Assess how well the algorithm is doing and make adjustments over time.

16. Which is your favorite machine learning algorithm?

It has to be Gradient Boosted Trees. All may be good though in different tasks.

15. Which language is best for deep learning, R or Python?

I prefer Python. I think it is more program-ish . R is good too.

16. What would someone trying to switch careers in data science need to gain aside from technical skills? As I don’t have a developer background would personal projects be the best way to showcase my knowledge?

The ability to translate business problems to machine learning, and transforming them into solvable problems.

17. Do you agree with the statement that in general feature engineering (so exploring and recombining predictors) is more efficient than improving predictive models to increase accuracy?

In principle – Yes. I think model diversity is better than having a few really strong models. But it depends on the problem.

18. Are the skills required to get to the leaderboard top on Kaggle also those you need for your day-to day job as a data scientist? Or do they intersect or are somewhat different? Can I make the idea of what a data scientist’s job is based on Kaggle competitions? And if a person does well on Kaggle does it follow that she will be a successful data scientist in her career ?

There is some percentage of overlap especially when it comes to making predictive models, working with data through python/R and creating reports and visualizations. What Kaggle does not offer (but you can get some idea) is:

How to translate a business question to a modelling (possibly supervised) problem
How to monitor models past their deployment
How to explain (many times) difficult concepts to stake holders.
I think there is always room for a good kaggler in the industry world. It is just that data science can have many possible routes. It may be for example that not everyone tends to be entrepreneurial in their work or gets to be very client facing, but rather solving very particular (technical) tasks.

19. Which machine learning concepts are must to have to perform well in a kaggle competition?

Data interrogation/exploration
Data transformation – pre-processing
Hands on knowledge of tools
Familiarity with metrics and optimization
Cross Validation
Model Tuning
Ensembling

20. How do you see the future of data scientist job? Is automation going to kill this job?

No – I don’t think so. This is what they used to say about automation through computing. But ended up requiring a lot of developers to get the job done! It may be possible that data scientists focus on softer tasks over time like translating business questions to ml problems and generally becoming shepherds’ of the process – as in managers/supervisors of the modelling process.

21. How to use ensemble modelling in R and Python to increase the accuracy of prediction. Please quote some real life examples?

You can see my github script as I explain different Machine leaning methods based on a Kaggle competition. Also, check this ensembling guide.

22. What is best python deep learning libraries or framework for text analysis?

I like Keras (because now supports sparse data), Gensim (for word 2 vec).

23. How valuable is the knowledge gained through these competitions in real life? Most often I see competitions won by ensembling many #s of models … is this the case in real life production systems? Or are interpretable models more valuable than these monster ensembles in real productions systems?

In some cases yes – being interpretable or fast (or memory efficient) is more important. Butthis is likely to change over time as people will be less afraid of black box solutions and focus on accuracy.

24. Should I worry about learning about the internals about the machine learning algorithms or just go ahead and try to form an understanding of the algorithms and use them (in competitions and to solve real life business problems) ?

You don’t need the internals. I don’t know all the internals. It is good if you do, but you don’t need to. Also there are new stuff coming out every day – sometimes is tough to keep track of it. That is why you should focus on the decent usage of any algorithm rather than over investing in one.

25. Which are the best machine learning techniques for imbalanced data?

I don’t do a special treatment here. I know people find that strange. This comes down to optimizing the right metric (for me). It is tough to explain in a few lines. There are many techniques for sampling, but I never had to use. Some people are using Smote. I don’t see value in trying to change the principal distribution of your target variable. You just end up with augmented or altered principal odds. If you really want a cut-off to decide on whether you should act or not – you may set it based on the principal odds.

I may not be the best person to answer this. I personally have never found it (significantly) useful to change the distribution of the target variable or the perception of the odds in the target variable. It may just be that other algorithms are better than others when dealing with this task (for example tree-based ones should be able to handle this).

26. Typically, marketing research problems have been mostly handled using standard regression techniques – linear and logistic regression, clustering, factor analyses, etc…My question is how useful are machine learning and deep learning techniques/algorithms useful to marketing research or business problems? For example how useful is say interpreting the output of a neural network to clients? Are there any resources you can refer to?

They are useful in the sense that you can most probably improve accuracy (in predicting let’s say marketing response) versus linear models (like regressions). Interpreting the output is hard and in my opinion it should not be necessary as we are generally moving towards more black box and complicated solutions.

As a data scientist you should put effort in making certain that you have a way to test how good your results are on some unobserved (test) data rather trying to understand why you get the type of predictions you are getting. I do think that decompressing information from complicating models is a nice topic (and valid for research), but I don’t see it as necessary.

On the other hand, companies, people, data scientists, statisticians and generally anybody who could be classified as a ‘data science player’ needs to get educated to accept black box solutions as perfectly normal. This may take a while, so it may be good to run some regressions along with any other modelling you are doing and generally try to provide explanatory graphs and summarized information to make a case for why your models perform as such.

27. How to build teams for collaboration on Kaggle ?

You can ask in forums (i.e in kaggle) . This may take a few competitions though before ’people can trust you’. Reason being, they are afraid of duplicate accounts (which violate competition rules), so people would prefer somebody who is proven to play fair. Assuming some time has passed, you just need to think of people you would like play with, people you think you can learn from and generally people who are likely to take different approaches than you so you can leverage the benefits of diversity when combining methods.

28. I have gone through basic machine learning course(theoretical) . Now I am starting up my practical journey , you just recommended to go through sci-kit learn docs & now people are saying TENSORFLOW is the next scikit learn , so should I go through scikit or TF is a good choice ?

I don’t agree with this statement ‘people are saying TENSORFLOW is the next scikit learn’. Tensorflow is a framework to do well certain machine learning tasks (like for deep learning). I think you can learn both, but I would start with scikit. I personally don’t know TensorFlow , but I use tools that are based on tensor flow (for example Keras). I am lazy I guess!

29. The main challenge that I face in any competition is cleaning the data and making it usable for prediction models. How do you overcome it ?

Yeah. I join the club! After a while you will create pipelines that could handle this relatively quicker. However…you always need to spend time here.

30. How to compute big data without having powerful machine?

You should consider tools like vowpal wabbit and online solutions, where you parse everything line by line. You need to invest more in programming though.

31. What is Feature Engineering?

In short, feature engineering can be understood as:

Feature transformation (e.g. converting numerical or categorical variables to other types)
Feature selections
Exploiting feature interactions (like should I combine variable A with variable B?)
Treating null values
Treating outliers

32. Which maths skills are important in machine learning?

Some basic probabilities along with linear algebra (e.g. vectors). Then some stats help too. Like averages, frequency, standard deviation etc.

33. Can you share your previous solutions?

See some with code and some without (just general approach).

34. How long should it take for you to build your first machine learning predictor ?

Depends on the problem (size, complexity, number of features). You should not worry about the time. Generally in the beginning you might spend much time on things that could be considered much easier later on. You should not worry about the time as it may be different for each person, given the programming, background or other experience.

35. Are there any knowledge competitions that you can recommend where you are not necessarily competing on the level as Kaggle but building your skills?

From here, both titanic and digit recognizer are good competitions to start. Titanic is better because it assumes a flat file. Digit recognizer is for image classification so it might be more advanced.

36. What is your opinion about using Weka and/or R vs Python for learning machine learning?

I like Weka. It has a good documentation– especially if you want to learn the algorithms. However I have to admit that it is not as efficient as some of the R and Python implementations. It has good coverage though. Weka has some good visualizations too – especially for some tree-based algorithms. I would probably suggest you to focus on R and Python at first unless your background is strictly in Java.

Summary

In short, succeeding in machine learning competition is all about learning new things, spending a lot of time training, feature engineering and validating models. Alongside, interact with community on forums, read blogs and learn from approach of fellow competitors.

Success is imminent, given that if you keep trying. Cheers!

How to design a take-home coding assignment that AI tools cannot complete for your candidate

Meta title: Design take-home coding tests AI can't complete Meta description: How to design a take-home coding assignment that AI tools cannot complete for your candidate — practical patterns that still produce hiring signal.

How to design a take-home coding assignment that AI tools cannot complete for your candidate

Estimated read time: 8 minutes

Many take-home coding assignments written before 2023 are now solvable by a mid-tier LLM in under 10 minutes. If you want to know how to design a take-home coding assignment that AI tools cannot complete for your candidate, the honest answer is that you probably can't — not entirely. What you can do is design an AI-resistant take-home coding assignment where AI is a normal part of the work, and the signal comes from what the candidate does around the AI: the judgment, the context handling, the debugging, the trade-offs they can defend on a follow-up call.

This is a shift in what a take-home is for. It stops being a proof of coding ability in isolation. It becomes a proof of engineering judgment in an AI-assisted workflow — which is closer to the actual job anyway.

Why the classic format broke in the AI era

The classic take-home — "build a small CRUD app in the language of your choice, submit in five days" — assumed the candidate would be the primary author of the code. That assumption held until roughly late 2022. GitHub's 2024 Octoverse report notes that AI-assisted development has become increasingly common across active repositories, and Stack Overflow's 2024 Developer Survey reported that 76% of professional developers are either currently using or planning to use AI tools in their development process, up from 70% in the 2023 survey.

The result: a candidate who submits a clean, working CRUD app has proven very little about their own ability. They have proven they can prompt a model and paste the output. That is a real skill, but it is not the skill most hiring managers are actually trying to test with a take-home.

Two consequences follow. First, in our experience working with technical hiring teams, the false-positive rate on take-homes has climbed sharply — candidates ship work that looks strong and then cannot discuss it. Second, strong candidates are increasingly resentful of long take-homes, because they know the format is broken and they know reviewers half-suspect the work is AI-generated anyway.

Developer AI Tool Adoption Rate: 2023 vs 2024 — Source: Stack Overflow Developer Survey, 2024

The core design shift for an LLM-resistant technical assignment: from "did you write this" to "can you defend this"

The premise worth adopting is simple. Assume AI assistance. Design the take-home so that AI help is expected, and the evaluation focuses on the parts of the work AI can't fake for the candidate on the follow-up conversation.

This is the same shift many university programs made when calculators became ubiquitous. The problems changed. The evaluation changed. The skill being tested changed.

For an AI-proof coding assessment, four design principles produce assignments that AI tools cannot complete for the candidate in a way that survives scrutiny.

1. Anchor the assignment in a context only the candidate has

Generic prompts ("build a URL shortener") are the easiest for AI to complete end-to-end. Contextual prompts force the candidate to make choices AI can't make for them.

Concrete patterns that work:

Give the candidate a broken repository — an intentionally flawed 200–400 line codebase — and ask them to identify the top three issues, fix one, and write a short note on the trade-offs of their fix. AI helps with the fix; the diagnosis and the trade-off note reveal judgment.
Provide a partial system with an ambiguous spec. Ask the candidate to list the three questions they would ask a product manager before writing more code, then implement against their own resolved assumptions. The questions are the signal.
Ask them to extend an existing feature rather than build from scratch. Extension requires reading, which AI is still weaker at than generation, and it produces a smaller code delta that is easier to discuss line by line.

The pattern: the deliverable includes both code and a short written artifact (a decision log, a set of questions, a diagnosis note). The written artifact is where AI signal degrades fastest, because it requires the candidate to have actually read what they submitted.

2. Require a live walkthrough as part of the AI-era hiring exercise

The single most effective defense against AI-completed take-homes is a 30-minute follow-up where the candidate walks a reviewer through their code, is asked to modify one function live, and is asked to explain a trade-off they made.

This is not an interrogation. It is a working session. Candidates who did the work themselves — with or without AI — handle it easily. Candidates who did not, don't.

Two things to design for the walkthrough:

Pick one function in their submission and ask them to modify its behavior in a small, specific way. "What if the input format changed to include a timezone?" Watch how they navigate the file, whether they know where the change belongs, and how they reason about downstream effects.
Ask them why they didn't do something. "Why didn't you cache this?" or "Why did you pick this data structure over a hash map?" The negative-space questions catch people who followed AI suggestions without evaluating alternatives.

If your hiring process can't support a 30-minute follow-up on every take-home submission, the take-home is not doing what you need it to do. Cut it and use a shorter, live-coded exercise instead. You can run live coding interviews with HackerEarth's FaceCode for the live component; a scheduled Zoom with a hiring manager works too.

3. Time-box tightly and make the scope visible

Long take-homes (5+ days, 10+ hours of work) are the format most vulnerable to AI completion. They also disproportionately screen out candidates with caregiving responsibilities, current jobs, or anything approaching a life outside work.

A 90-minute to 3-hour take-home, with the scope stated explicitly, does more work than a five-day project. Candidates who spend 15 hours on a 3-hour assignment produce output that no longer represents their unaided ability, and the extra time doesn't produce better signal — it produces more polish, which is the exact thing AI adds cheaply.

State the scope in the assignment: "This should take a strong candidate roughly 2 hours. If you're spending significantly more, stop and submit what you have with a note on what you'd do next."

4. Evaluate against an explicit rubric, not against a "gut feel" ceiling

Rubric drift is the quiet killer of take-home evaluations. Two reviewers looking at the same submission reach different conclusions, and when AI is in the mix, "this feels AI-generated" becomes a stand-in for "I don't trust this." That is not a defensible evaluation.

An explicit rubric for a take-home coding assignment AI can't complete covers at least four dimensions:

Correctness against the stated requirements
Code quality relative to the seniority level being hired
Quality of the written artifact (decision log, questions, or trade-off note)
Performance in the walkthrough — specifically, ability to modify their own code and defend their choices

Score each dimension separately. Calibrate with two reviewers on the first five submissions of any new take-home before rolling it out broadly. Rubric-based evaluation is one of the areas where structured platforms help more than most people expect — for a deeper look at how to build rubrics that hold up across reviewers, see our guide to building a technical interview rubric.

What not to do

A few defensive moves get suggested often and don't work as well as advertised.

Aggressive AI-detection tools. Tools that claim to detect AI-generated code have false-positive rates that practitioner reports suggest are high enough to hurt honest candidates. Vendors of AI-detection tools designed for prose, such as Turnitin, have publicly acknowledged that detection accuracy drops on edited or paraphrased content, and code is easier to lightly rewrite than prose. (See Turnitin's guidance on AI writing detection accuracy.) Using detection scores as an evaluation input creates unfair rejections and legal exposure. Don't.

Banning AI use. Telling candidates "do not use AI tools" produces two outcomes: honest candidates follow the rule and are handicapped relative to the job's actual conditions, and dishonest candidates use AI anyway. The rule punishes the wrong people.

Locking down the environment. Proctored, keylogger-monitored take-home environments produce a candidate experience that top candidates walk away from. They also don't work — a second laptop sits next to the first one. Proctoring belongs in high-stakes assessments, not take-homes.

Making the assignment harder. Practitioner experience suggests that increasing difficulty to "outpace" AI often produces problems that AI still solves and that human candidates now fail. The result is a smaller, more frustrated candidate pool with no better signal.

A worked example of an AI-resistant take-home coding assignment

For a mid-level backend engineer role, a take-home that works as of 2026:

Provide a repo with a small REST service (300 lines of Python or Go) that has three problems: one obvious bug, one performance issue that only shows up at scale, and one design flaw that will bite the next engineer to touch it. Ask the candidate to:

Identify all three issues in a written diagnosis (max 400 words).
Fix the bug and open a PR-style diff.
In their submission note, describe how they'd address the other two issues and what trade-offs each fix involves.
Come to a 30-minute walkthrough prepared to modify their fix live in response to a changed requirement.

Total candidate time: 2–3 hours. AI helps with the fix and possibly drafts the diagnosis, but the walkthrough — where they explain the two issues they didn't fix and defend the trade-offs — is where the actual signal appears.

Frequently asked questions

Can I design a take-home coding assignment that AI tools cannot complete at all for the candidate?

Not reliably, and pursuing that goal leads to worse assignments. The workable version is to design a take-home where AI assistance is expected and the evaluation focuses on judgment, context, and defense of choices — which is what the job requires anyway.

How long should a take-home coding assignment be in 2026?

For most roles, 90 minutes to 3 hours of stated scope, with a 30-minute live follow-up. Practitioner experience suggests longer take-homes correlate with drop-out among strong candidates and with over-polished AI-assisted submissions that don't reflect the candidate's own ability.

Should we tell candidates they can use AI tools on the take-home?

Yes, explicitly. State that AI tools are permitted and expected, and that the follow-up walkthrough will focus on the candidate's ability to explain and modify their submission. This is more honest, produces less anxiety, and doesn't change the signal you get from the walkthrough.

What if a candidate refuses the live walkthrough?

Treat it the way you'd treat a candidate refusing any standard step in the process. The walkthrough is not optional in an AI-assisted world; it's where the take-home actually gets evaluated. If the process is designed so the walkthrough is 30 minutes and scheduled within a week of submission, refusal is rare.

Do AI-detection tools work for code?

Not well enough to use as an evaluation input. Research and practitioner reports suggest false-positive rates are high, honest candidates get flagged, and the tools don't survive an adversarial candidate who edits the AI output. Use structural design — walkthroughs, rubric-based evaluation, contextual prompts — rather than detection.

Key takeaways

Assume AI assistance in every take-home submission; design for it rather than against it.
Anchor assignments in context — broken repos, partial systems, extension tasks — that AI can help with but can't fully own.
Require a 30-minute live walkthrough as a non-negotiable part of the process; it is where the actual signal lives.
Keep scope tight (2–3 hours) and score against an explicit rubric with at least two calibrated reviewers.
Skip AI-detection tools, aggressive proctoring, and AI bans — they punish honest candidates and don't stop dishonest ones.

See it in action

The rubric-drift problem described in principle 4 — two reviewers reaching different conclusions on the same submission — is the specific gap HackerEarth Assessments is built to close. Structured rubric scoring across reviewers keeps evaluations calibrated on the diagnosis, code, and walkthrough dimensions separately, so "this feels AI-generated" stops standing in for a defensible score. To see how it maps to the diagnosis-and-extension format described above, book a walkthrough of HackerEarth Assessments.

AI Recruiting

AI Candidate Screening: A TA Leader's Guide

AI candidate screening: a practical guide for talent acquisition leaders

Meta title: AI candidate screening: a guide for TA leaders | HackerEarth Meta description: How AI candidate screening works, where it fails, and how TA leaders can evaluate tools, measure outcomes, and stay compliant with NYC Local Law 144 and the EU AI Act.

AI candidate screening — the use of machine learning and automation to parse, score, and prioritize applicants during early-stage hiring — is now a program-design decision for talent acquisition leaders, not just a recruiter productivity tool. LinkedIn's 2024 Future of Recruiting report found that recruiters spend roughly a third of their week on sourcing and screening tasks, and the volume side of the equation is only growing: LinkedIn has reported application volumes per job climbing sharply since generative AI writing tools became widely available.

That combination — more applications, similar-looking resumes, tighter timelines — is what pushes AI candidate screening from a "nice to have" into a funnel-conversion and pipeline-coverage question that shows up in executive reporting.

This guide covers how AI candidate screening works, where it underperforms, how to evaluate vendors against your ATS (Workday, Greenhouse, Lever, SmartRecruiters), and what compliance frameworks such as NYC Local Law 144 and the EU AI Act require before deployment.

Recruiter Time Allocation by Task — Source: LinkedIn Future of Recruiting Report, 2024; remaining categories illustrative based on article claims

Why resume-only screening breaks at scale

Resume screening was designed for a hiring environment that no longer exists. Recruiters reviewed education, work history, certifications, and keywords to determine whether an applicant should move forward.

The problem is that resumes were never designed to measure skills. A candidate may list Python, Java, or "cloud infrastructure" without being able to apply any of them; conversely, capable candidates get filtered out because their resumes don't hit keyword thresholds. Research summarized by SHRM and McKinsey consistently points to the weak predictive validity of unstructured resume review for job performance.

At high volume, this gets worse. When a recruiter has to clear 400 applications for one role in a week, decisions collapse toward surface signals — school name, employer brand, keyword density — rather than validated capability.

This is also why skills-based hiring frameworks such as O*NET and SFIA have gained traction: they give TA teams a structured vocabulary for what a role actually requires, which is a prerequisite for any AI screening system to score against.

Comparison of traditional resume screening and AI candidate screening workflows — Figure 1: Traditional screening centers on resume review; AI candidate screening incorporates additional candidate signals such as assessments and structured evaluations. Source: HackerEarth.

Dimension	Traditional screening	AI candidate screening
Primary input	Resume, cover letter	Resume + assessment data + structured interview signals
Evaluation basis	Keywords, credentials	Demonstrated skills, scored responses
Consistency	Varies by recruiter	Rubric-based, auditable
Scalability	Linear with headcount	Handles high-volume events (e.g., campus, RIF backfill)
Reporting	Manual funnel metrics	Funnel conversion, slate diversity, time-to-shortlist

Time-to-Shortlist: Manual vs. AI Screening at High Volume — Source: Illustrative based on article claims (days to shortlist)

What AI candidate screening actually is

AI candidate screening is the application of machine learning and rules-based automation to evaluate, prioritize, and organize candidates in the early stages of a hiring funnel.

Depending on the platform, an AI screening system may score resumes, application answers, assessment results, coding submissions, or recorded interview responses against a role-specific rubric. The output is typically a ranked shortlist plus explanations of why each candidate scored where they did.

The point is not to replace recruiter judgment. It is to reallocate recruiter time from administrative triage to candidate evaluation, and to make the triage step auditable enough that a Head of TA can defend the funnel to a CHRO or a regulator.

Modern AI screening tools generally integrate with an ATS such as Workday, Greenhouse, or Lever, and increasingly sit alongside skills assessments and structured interview platforms rather than replacing them.

How AI screening works in a technical hiring funnel

An AI candidate screening workflow begins when a candidate enters the funnel — application, referral, sourcing campaign, or talent community. From there:

Ingest. Application data and resume are parsed and normalized against role criteria.
Signal collection. For technical roles, the workflow adds skills assessments, coding challenges, or structured interview scores.
Scoring. Each candidate is scored against a rubric derived from the job's must-have and nice-to-have skills.
Ranking and explanation. Recruiters see a ranked slate with the reasoning behind each score, not just a number.
Human review. Recruiters and hiring managers make the shortlist decision using the AI output as one input among several.

For TA leaders managing high-volume or campus hiring, this structure is what turns AI screening from a black box into something you can report on: funnel conversion at each stage, slate diversity, recruiter productivity per requisition, and time-to-shortlist.

The business case: what AI screening changes at the TA function level

For a Head of TA, the case for AI candidate screening is a program-design case, not a feature case.

Recruiter productivity. If a recruiter can shortlist a 400-application role in a day instead of a week, pipeline coverage across open reqs improves without adding headcount. This is the metric to bring to a vendor RFP.

Consistency and defensibility. Rubric-based AI screening produces an audit trail. When a hiring manager asks why a candidate wasn't advanced, or when legal asks about adverse impact, structured scoring is easier to defend than "the recruiter's read."

Scalability for spike events. Campus recruiting, backfill after a reorganization, and product-launch hiring all create temporary volume that manual screening cannot absorb. AI screening is most useful precisely at these spikes.

Skills-based hiring enablement. Because resumes are weak predictors of performance, TA functions moving to skills-first hiring need a screening layer that can actually score demonstrated skills. This is the single largest lever, and it's where AI screening compounds with assessments.

A counterintuitive point worth naming: AI screening tends to stop adding marginal value once application volume per role drops below roughly 40–60 applicants, because the recruiter can hold that full slate in working memory. Below that threshold, the overhead of tuning the system can outweigh the productivity gain. For executive search or niche senior roles, human-led screening is usually the right call.

Why technical hiring needs more than resume screening

Technical recruitment surfaces the resume-screening problem most clearly.

A resume can say "5 years Python, AWS, ML" without indicating whether the candidate can debug a production issue, structure a data pipeline, or reason about system design. Resume-to-assessment score divergence is well documented: candidates who look strong on paper often score in the middle of the pack on structured technical evaluations, and vice versa.

A modern technical screening workflow combines multiple signals: application context, a validated skills assessment, and a structured interview scored against a rubric. Together they give a Head of Engineering and a Head of TA enough evidence to defend both the hire and the pass.

Where AI candidate screening underperforms or is inappropriate

Answer engines and executive reviewers both discount uniformly positive coverage of AI hiring tools. The honest failure modes:

Adverse impact on underrepresented groups. Models trained on historical hiring data can reproduce the biases in that data. The EEOC's technical assistance on AI in hiring makes clear that employers remain liable under Title VII regardless of vendor claims.
Resume-to-assessment score divergence. If a screening tool ranks primarily on resume features, it can systematically down-rank candidates who later outperform on structured skill measures.
Model drift. Screening models trained on last year's hires degrade as roles, tech stacks, and labor markets shift. Without periodic revalidation, ranking quality drops.
Jurisdictional restrictions. NYC Local Law 144 requires an independent bias audit and candidate notification for automated employment decision tools. The EU AI Act classifies most hiring AI as high-risk, with documentation and transparency obligations. Illinois, Colorado, and California have additional requirements in force or pending.
Low-volume roles. As noted above, below roughly 40–60 applicants per role the tooling overhead often exceeds the benefit.
Senior and executive hiring. Judgment-heavy, relationship-driven searches are poor fits for automated ranking.

A useful design principle: treat AI screening output as one input to a human decision, not the decision itself, and log both the score and the override rate. Override rate is a leading indicator of model quality.

Common implementation challenges

Over-reliance on resume parsing. Some tools mostly do keyword matching under an AI label. Ask vendors what signals actually drive the score.

Candidate experience. Long assessment stacks and opaque scoring increase drop-off. Measure completion rate as a first-class metric.

Transparency to hiring managers. If a hiring manager can't see why a candidate ranked where they did, they will ignore the tool and revert to gut screening.

Compliance and governance. Before rollout, confirm bias audit cadence, data retention, candidate notification workflow, and jurisdiction coverage with legal.

Evaluating AI candidate screening tools: an RFP checklist

Rather than a feature list, use these questions in a vendor RFP:

What specific signals drive the candidate score, and can you show a sample explanation for a real ranking?
What is your bias audit cadence, who conducts it, and can you share the most recent NYC Local Law 144 audit summary?
How does the system handle model drift, and how often is the model revalidated against outcome data?
What is your integration depth with our ATS (Workday, Greenhouse, Lever, SmartRecruiters), and does data flow both ways?
What funnel and slate-diversity metrics are exposed for executive reporting?
What is the assessment completion rate benchmark for candidates in our role families?
For technical roles, can the platform administer and score coding evaluations at scale, and what is the largest single event you have supported?

How HackerEarth fits into an AI candidate screening program

HackerEarth's assessment and interview stack is built for technical hiring at scale, and slots into an AI screening program as the skills-signal layer that resume-based tools can't produce on their own.

HackerEarth Assessments covers 1,000+ skills across 40+ programming languages, with role-specific tests, coding challenges, and project-based evaluations that give recruiters a validated signal beyond the resume. Discover Dollar, for example, used HackerEarth to run assessments for 2,000 candidates in a single weekend — the kind of scale that manual screening cannot absorb.

FaceCode provides structured, rubric-scored technical interviews with live coding, so the interview stage produces the same auditable signal as the assessment stage.

OnScreen (launched April 14, 2026, currently available to enterprise customers with pilot access at hackerearth.com/ai/onscreen) is an AI interview tool that conducts structured technical interviews 24/7 using video-avatar interviewers with built-in identity verification. It is designed for high-volume top-of-funnel technical screening where scheduling human interviewers is the bottleneck.

Across these products, HackerEarth serves 500+ global enterprises and a 10M+ developer community, which is the dataset behind the skills taxonomy and role benchmarks.

HackerEarth Assessments, FaceCode, and OnScreen mapped to stages of the technical hiring funnel — Figure 2: HackerEarth Assessments, FaceCode, and OnScreen mapped to stages of a technical hiring funnel. Source: HackerEarth.

Frequently asked questions

How does AI candidate screening work? AI candidate screening ingests applications and additional signals (assessments, structured interview scores), scores each candidate against a role-specific rubric, and returns a ranked, explainable shortlist to the recruiter. A human still makes the shortlist decision.

Is AI candidate screening biased? It can be. Models trained on historical hiring data can reproduce historical bias, and the EEOC has clarified that employers remain liable under Title VII regardless of vendor claims. Regular independent bias audits — required under NYC Local Law 144 for tools used on NYC candidates — and monitoring adverse impact ratios are the standard mitigations.

Is AI candidate screening legal? It is legal in most jurisdictions but increasingly regulated. NYC Local Law 144 requires bias audits and candidate notification. The EU AI Act treats most hiring AI as high-risk. Illinois, Colorado, and California have additional obligations. Confirm coverage with legal before deployment.

What is the best AI screening software for technical hiring? The right tool depends on volume, role mix, and ATS. For technical hiring specifically, look for validated skills assessments, coding evaluation at scale, structured interview scoring, and native integration with your ATS. HackerEarth Assessments, FaceCode, and OnScreen are built for this use case.

When does AI candidate screening stop adding value? Below roughly 40–60 applicants per role, or for senior and executive searches, the overhead of tuning and monitoring the system often outweighs the productivity gain. Reserve AI screening for high-volume and repeatable role families.

How do I measure whether AI candidate screening is working? Track time-to-shortlist, recruiter productivity per requisition, funnel conversion by stage, slate diversity, assessment completion rate, override rate (how often recruiters overrule the AI ranking), and quality-of-hire at 6 and 12 months.

Next steps

If you're evaluating AI candidate screening for a technical hiring program, the fastest way to pressure-test whether it fits your funnel is to run a scoped pilot against one high-volume role family.

Request a HackerEarth demo to see Assessments, FaceCode, and OnScreen against your own role requirements, or explore OnScreen pilot access if 24/7 structured technical interviews are your current bottleneck.

AI Recruiting

How AI-Generated CVs Are Breaking Technical Hiring (and What Actually Works Now)

AI-generated CVs are breaking technical hiring by flooding the top of the funnel with resumes that look qualified, read as tailored, and often fail to reflect actual technical ability. The problem isn't simply more applications it's lower-quality hiring signals at much higher volume.

Many hiring teams responded by tightening resume filters. Unfortunately, that only delays the problem. If resumes are already an unreliable signal, adding more resume-based screening simply pushes poor matches further into recruiter screens, technical interviews, and engineering calendars.

What "AI-Generated CVs" Means in 2026

Not every AI-assisted resume represents the same challenge.

Tailored writing refers to candidates using AI tools to rewrite an accurate resume for a specific job description. The experience is genuine; AI simply improves presentation.

Inflated writing is more problematic. Candidates exaggerate projects, technical depth, or ownership using AI, creating resumes that appear impressive but don't hold up during interviews.

Fully synthetic applications involve fake identities, automated submissions, or proxy candidates attempting to move through the hiring process. While less common, they create significant hiring risk.

According to LinkedIn's Future of Recruiting report, AI is rapidly changing how candidates apply for jobs. As application volumes rise, many organizations are seeing resume quality decline rather than improve.

Why Resume Screening Isn't Working Anymore

Resume screening has always been an imperfect predictor of technical ability. What has changed is how easy it has become to create an optimized resume.

Today, candidates can generate resumes that closely match job descriptions within minutes. Keyword-based ATS filters often rank these resumes highly, even when the underlying skills don't match the role. As a result, recruiters spend more time reviewing candidates who appear qualified on paper but struggle during technical evaluations.

What Actually Works

Organizations seeing the best hiring outcomes are shifting their focus from resumes to stronger evaluation signals.

Start with Skills

Instead of reviewing resumes first, many teams now begin with a role-specific technical assessment. The assessment becomes the primary hiring signal, while the resume provides supporting context rather than acting as the initial filter.

Design AI-Friendly Take-Home Assignments

Rather than trying to prevent AI use, successful teams design assignments that assume candidates will use AI. Evaluation focuses on decision-making, technical reasoning, and the candidate's ability to explain trade-offs instead of whether AI helped write the code.

Standardize Technical Interviews

Structured interviews improve consistency by ensuring every candidate is evaluated using the same questions, scoring criteria, and rubrics. For remote hiring, identity verification also helps reduce proxy interview risks.

Review Every Signal Together

Strong hiring decisions rarely come from a single assessment. Teams that review technical assessments, interviews, take-home assignments, and recruiter feedback together are better able to distinguish genuine talent from polished resumes.

Where the Impact Is Greatest

The effects of AI-generated resumes vary across hiring scenarios. High-volume campus hiring often struggles with resume inflation, making skills assessments especially valuable. Remote senior engineering hiring faces greater risks from proxy candidates, while regulated industries require structured, well-documented hiring processes that can withstand audits.

What to Avoid

Adding more resume filters rarely improves hiring quality. AI detection tools continue to produce unreliable results, and requiring cover letters simply encourages candidates to generate more AI-written content. Likewise, "AI-proof" assessment questions often frustrate genuine candidates without preventing misuse.

Key Takeaways

AI-generated resumes have fundamentally changed technical hiring by reducing the reliability of resume-based screening. Organizations that shift toward skills-first assessments, structured interviews, and evidence-based hiring decisions are better equipped to identify genuine technical talent while delivering a fairer candidate experience.

Winning Tips on Machine Learning Competitions by Kazanova, Current Kaggle #3

Need A Quick Summary? Ask AI.

Introduction

Questions & Answers

Summary

Stay ahead, one post at a time.

Thank you for subscribing!

Hire top tech talent with our recruitment platform

Discover more articles

How to design a take-home coding assignment that AI tools cannot complete for your candidate

How to design a take-home coding assignment that AI tools cannot complete for your candidate

Why the classic format broke in the AI era

The core design shift for an LLM-resistant technical assignment: from "did you write this" to "can you defend this"

1. Anchor the assignment in a context only the candidate has

2. Require a live walkthrough as part of the AI-era hiring exercise

3. Time-box tightly and make the scope visible

4. Evaluate against an explicit rubric, not against a "gut feel" ceiling

What not to do

A worked example of an AI-resistant take-home coding assignment

Frequently asked questions

Can I design a take-home coding assignment that AI tools cannot complete at all for the candidate?

How long should a take-home coding assignment be in 2026?

Should we tell candidates they can use AI tools on the take-home?

What if a candidate refuses the live walkthrough?

Do AI-detection tools work for code?

Key takeaways

See it in action

AI Candidate Screening: A TA Leader's Guide

AI candidate screening: a practical guide for talent acquisition leaders

Why resume-only screening breaks at scale

What AI candidate screening actually is

How AI screening works in a technical hiring funnel

The business case: what AI screening changes at the TA function level

Why technical hiring needs more than resume screening

Where AI candidate screening underperforms or is inappropriate

Common implementation challenges

Evaluating AI candidate screening tools: an RFP checklist

How HackerEarth fits into an AI candidate screening program

Frequently asked questions

Next steps

How AI-Generated CVs Are Breaking Technical Hiring (and What Actually Works Now)

How AI-Generated CVs Are Breaking Technical Hiring (and What Actually Works Now)

What "AI-Generated CVs" Means in 2026

Why Resume Screening Isn't Working Anymore

What Actually Works

Start with Skills

Design AI-Friendly Take-Home Assignments

Standardize Technical Interviews

Review Every Signal Together

Where the Impact Is Greatest

What to Avoid

Key Takeaways

Explore HackerEarth’s

top products for Hiring & Innovation

Need A Quick Summary?
Ask AI.