Home
/
Blog
/
Developer Insights
/
Winning Tips on Machine Learning Competitions by Kazanova, Current Kaggle #3

Winning Tips on Machine Learning Competitions by Kazanova, Current Kaggle #3

Author
Team Machine Learning
Calendar Icon
March 9, 2017
Timer Icon
3 min read
Share

Explore this post with:

Introduction

Machine Learning is tricky. No matter how many books you read, tutorials you finish or problems you solve, there will always be a data set you might come across where you get clueless. Specially, when you are in your early days of Machine Learning. Isn’t it ?

In this blog post, you’ll learn some essential tips on building machine learning models which most people learn with experience.These tips were shared by Marios Michailidis(a.k.a Kazanova), Kaggle Grandmaster, Current Rank #3 in a webinar happened on 5th March 2016. The webinar had three aspects:

  1. VideoWatch Here.
  2. Slides – Slides used in the video were shared by Marios. Indeed, an enriching compilation of machine learning knowledge. Below are the slides.
  3. Q & As – This blog enlists all the questions asked by participants at webinar.

The key to succeeding in competitions is perseverance. Marios said, ‘I won my first competition (Acquired valued shoppers challenge) and entered kaggle’s top 20 after a year of continued participation on 4 GB RAM laptop (i3)’.Were you planning to give up ?

While reading Q & As, if you have any questions, please feel free to drop them in comments!

Questions & Answers

1. What are the steps you follow for solving a ML problem? Please describe from scratch.

Following are the steps I undertake while solving any ML problem:

  1. Understand the data – After you download the data, start exploring features. Look at data types. Check variable classes. Create some univariate – bivariate plots to understand the nature of variables.
  2. Understand the metric to optimize – Every problem comes with a unique evaluation metric. It’s imperative for you to understand it, specially how does it change with target variable.
  3. Decide cross validation strategy – To avoid overfitting, make sure you’ve set up a cross validation strategy in early stages. A nice CV strategy willhelp you get reliable score on leaderboard.
  4. Start hyper parameter tuning– Once CV is at place, try improving model’s accuracy using hyper parameter tuning. It further includes the following steps:
    • Data transformations: It involve steps like scaling, removing outliers, treating null values, transform categorical variables, do feature selections, create interactions etc.
    • Choosing algorithms and tuning their hyper parameters: Try multiple algorithms to understand how model performance changes.
    • Saving results: From all the models trained above, make sure you save their predictions. They will be useful for ensembling.
    • Combining models: At last, ensemble the models, possibly on multiple levels. Make sure the models are correlated for best results.

Machine learning challenge, ML challenge

2. What are the model selection and data manipulation techniques you follow to solve a problem?

Generally, I try (almost) everything for most problems. In principle for:

  • Time series: I use GARCH, ARCH, regression, ARIMA models etc.
  • Image classification: I use deep learning (convolutional nets) in python.
  • Sound Classification :Common neural networks
  • High cardinality categorical (like text data): I use linear models, FTRL, Vowpal wabbit, LibFFM, libFM, SVD etc.

For everything else,I use Gradient boosting machines (like XGBoost and LightGBM) and deep learning (like keras, Lasagne, caffe, Cxxnet). I decide what model to keep/drop in Meta modelling with feature selection techniques.Some of the feature selection techniques I use includes:

  • Forward (cv or not) – Start from null model. Add one feature at a time and check CV accuracy. If it improves keep the variable, else discard.
  • Backward (cv or not) – Start from full model and remove variables one by one. It CV accuracy improves by removing any variable, discard it.
  • Mixed (or stepwise) – Use a mix of above to techniques.
  • Permutations
  • Using feature importance – Use random forest, gbm, xgboost feature selection feature.
  • Apply some stats’ logic such as chi-square test, anova.

Data manipulation could be different for every problem :

  • Time series : You can calculate moving averages, derivatives. Remove outliers.
  • Text : Useful techniques are tfidf, countvectorizers, word2vec, svd (dimensionality reduction). Stemming, spell checking, sparse matrices, likelihood encoding, one hot encoding (or dummies), hashing.
  • Image classification: Here you can do scaling, resizing, removing noise (smoothening), annotating etc
  • Sounds : Calculate Furrier Transforms , MFCC (Mel frequency cepstral coefficients), Low pass filters etc
  • Everything else : Univariate feature transformations (like log +1 for numerical data), feature selections, treating null values, removing outliers, converting categorical variables to numeric.

3. Can you elaborate cross validation strategy?

Cross validation means that from my main set, I create RANDOMLY 2 sets. I built (train) my algorithm with the first one (let’s call it training set) and score the other (let’s call it validation set). I repeat this process multiple times and always check how my model performs on the test set in respect to the metric I want to optimize.

The process may look like:

  • For 10 (you choose how many X) times
  • Split the set in training (50%-90% of the original data)
  • And validation (50%-10% of the original data)
  • Then fit the algorithm on the training set
  • Score the validation set.
  • Save the result of that scoring in respect to the chosen metric.
  • Calculate the average of these 10 (X) times. That how much you expect this score in real life and is generally a good estimate.
  • Remember to use a SEED to be able to replicate these X splits

Other things to consider is Kfold and stratified KFold . Read here.For time sensitive data, make certain you always the rule of having past predicting future when testing’s.

4. Can you please explain sometechniques usedfor cross validation?

  • Kfold
  • Stratified Kfold
  • Random X% split
  • Time based split
  • For large data, just one validation set could suffice (like 20% of the data – you don’t need to do multiple times).

5. How did you improve your skills in machine learning? What training strategy did you use?

I did a mix of stuff in 2. Plus a lot of self-research. Alongside,programming and software (in java) and A LOT of Kaggling ☺

6. Which are the most useful python libraries for a data scientist ?

Below are some libraries which I find most useful in solving problems:

  • Data Manipulation
    • Numpy
    • Scipy
    • Pandas
  • Data Visualization
    • Matplotlib
  • Machine Learning / Deep Learning
    • Xgboost
    • Keras
    • Nolearn
    • Gensim
    • Scikit image
  • Natural Language Processing
    • NLTK

7. What are useful ML techniques / strategies to impute missing values or predict categorical label when all the variables are categorical in nature.

Imputing missing values is a critical step. Sometimes you may find a trend in missing values. Below are some techniques I use:

  • Use mean, mode, median for imputation
  • Use a value outside the range of the normal values for a variable. like -1 ,or -9999 etc.
  • Replace witha likelihood – e.g. something that relates to the target variable.
  • Replace with something which makes sense. For example: sometimes null may mean zero
    • Try to predict missing values based on subsets of know values
    • You may consider removing rows with many null values

8. Can you elaborate what kind of hardware investment you have done i.e. your own PC/GPU setup for Deep learning related tasks? Or were you using more cloud based GPU services?

I won my first competition (Acquired valued shoppers challenge) and entered kaggle’s top 20 after a year of continued participation on 4 GB RAM laptop (i3). I was using mostly self-made solutions up to this point (in Java). That competition it had something like 300,000,000 rows of data of transactions you had to aggregate so I had to parse the data and be smart to keep memory usage at a minimum.

However since then I made some good investments to become Rank #1. Now, I have access to linux servers of 32 cores and 256 GBM of RAM. I also have a geforce 670 machine (for deep learning /gpu tasks) . Also, I use mostly Python now. You can consider Amazon’s AWS too, however this is mostly if you are really interested in getting to the top, because the cost may be high if you use it a lot.

9. Do you use high performing machine like GPU. or for example do you do thing like grid search for parameters for random forest(say), which takes lot of time, so which machine do you use?

I use GPUs (not very fast, like a geforce 670) for every deep learning training model. I have to state that for deep learning GPU is a MUST. Training neural nets on CPUs takes ages, while a mediocre GPU can make a simple nn (e.g deep learning) 50-70 times faster. I don’t like grid search. I do this fairly manually. I think in the beginning it might be slow, but after a while you can get to decent solutions with the first set of parameters! That is because you can sort of learn which parameters are best for each problem and you get to know the algorithms better this way.

10. How do people built around 80+ models is it by changing the hyper parameter tuning ?

It takes time. Some people do it differently. I have some sets of params that worked in the past and I initialize with these values and then I start adjusting them based on the problem at hand. Obviously you need to forcefully explore more areas (of hyper params in order to know how they work) and enrich this bank of past successful hyper parameter combinations for each model. You should consider what others are doing too. There is NO only 1 optimal set of hyper params. It is possible you get a similar score with a completely different set of params than the one you have.

11. How does one improve their kaggle rank? Sometimes I feel hopeless while working on any competition.

It’s not an overnight process. Improvement on kaggle or anywhere happens with time. There are no shortcuts. You need to just keep doing things. Below are some of the my recommendations:

  • Learn better programming: Learn python if you know R.
  • Keep learning tools (listed below)
  • Read some books.
  • Play in ‘knowledge’ competitions
  • See what the others are doing in kernels or in past competitions look for the ‘winning solution sections’
  • Team up with more experience users, but you need to improve your ranking slightly before this happens
  • Create a code bank
  • Play … a lot!

12. Can you tellus about some usefultools used in machine learning ?

Below is the list of my favourite tools:

13. How to start with machine learning?

I like these slides from the university of utah in terms of understanding some basic algorithms and concepts about machine learning. This book for python. I like this book too. Don’t forget to follow the wonderful scikit learn documentation. Use jupyter notebook from anaconda.

You can find many good links that have helped me in kaggle here. Look at ‘How Did you Get Better at Kaggle’

In addition, you should do Andrew Ng’s machine learning course. Alongside, you can follow some good blogs such as mlwave, fastml, analyticsvidhya. But the best way is to get your hands dirty. do some kaggle! tackle competitions that have the “knowledge” flag first and then start tackling some of the main ones. Try to tackle some older ones too.

14. What techniques perform best on large data sets on Kaggle and in general ? How to tackle memory issues ?

Big data sets with high cardinality can be tackled well with linearmodels. Consider sparse models. Tools like vowpal wabbit. FTRL , libfm, libffm, liblinear are good tools matrices in python (things like csr matrices). Consider ensembling (like combining) models trained on smaller parts of the data.

15. What is the SDLC (Sofware Development Life Cycle) of projects involving Machine Learning ?

  • Give a walk-through on an industrial project and steps involved, so that we can get an idea how they are used. Basically, I am in learning phase and would expect to get an industry level exposure.
  • Business questions: How to recommend products online to increase purchases.
  • Translate this into an ml problem. Try to predict what the customer will buy in the future given some data available at the time the customer is likely to make the click/purchase, given some historical exposures to recommendations
  • Establish a test /validation framework.
  • Find best solutions to predict best what customer chose.
  • Consider time/cost efficiency as well as performance
  • Export model parameters/pipeline settings
  • Apply these in an online environment. Expose some customers but NOT all. Keep test and control groups
  • Assess how well the algorithm is doing and make adjustments over time.

16. Which is your favorite machine learning algorithm?

It has to be Gradient Boosted Trees. All may be good though in different tasks.

15. Which language is best for deep learning, R or Python?

I prefer Python. I think it is more program-ish . R is good too.

16. What would someone trying to switch careers in data science need to gain aside from technical skills? As I don’t have a developer background would personal projects be the best way to showcase my knowledge?

The ability to translate business problems to machine learning, and transforming them into solvable problems.

17. Do you agree with the statement that in general feature engineering (so exploring and recombining predictors) is more efficient than improving predictive models to increase accuracy?

In principle – Yes. I think model diversity is better than having a few really strong models. But it depends on the problem.

18. Are the skills required to get to the leaderboard top on Kaggle also those you need for your day-to day job as a data scientist? Or do they intersect or are somewhat different? Can I make the idea of what a data scientist’s job is based on Kaggle competitions? And if a person does well on Kaggle does it follow that she will be a successful data scientist in her career ?

There is some percentage of overlap especially when it comes to making predictive models, working with data through python/R and creating reports and visualizations. What Kaggle does not offer (but you can get some idea) is:

  • How to translate a business question to a modelling (possibly supervised) problem
  • How to monitor models past their deployment
  • How to explain (many times) difficult concepts to stake holders.
  • I think there is always room for a good kaggler in the industry world. It is just that data science can have many possible routes. It may be for example that not everyone tends to be entrepreneurial in their work or gets to be very client facing, but rather solving very particular (technical) tasks.

19. Which machine learning concepts are must to have to perform well in a kaggle competition?

  • Data interrogation/exploration
  • Data transformation – pre-processing
  • Hands on knowledge of tools
  • Familiarity with metrics and optimization
  • Cross Validation
  • Model Tuning
  • Ensembling

20. How do you see the future of data scientist job? Is automation going to kill this job?

No – I don’t think so. This is what they used to say about automation through computing. But ended up requiring a lot of developers to get the job done! It may be possible that data scientists focus on softer tasks over time like translating business questions to ml problems and generally becoming shepherds’ of the process – as in managers/supervisors of the modelling process.

21. How to use ensemble modelling in R and Python to increase the accuracy of prediction. Please quote some real life examples?

You can see my github script as I explain different Machine leaning methods based on a Kaggle competition. Also, check this ensembling guide.

22. What is best python deep learning libraries or framework for text analysis?

I like Keras (because now supports sparse data), Gensim (for word 2 vec).

23. How valuable is the knowledge gained through these competitions in real life? Most often I see competitions won by ensembling many #s of models … is this the case in real life production systems? Or are interpretable models more valuable than these monster ensembles in real productions systems?

In some cases yes – being interpretable or fast (or memory efficient) is more important. Butthis is likely to change over time as people will be less afraid of black box solutions and focus on accuracy.

24. Should I worry about learning about the internals about the machine learning algorithms or just go ahead and try to form an understanding of the algorithms and use them (in competitions and to solve real life business problems) ?

You don’t need the internals. I don’t know all the internals. It is good if you do, but you don’t need to. Also there are new stuff coming out every day – sometimes is tough to keep track of it. That is why you should focus on the decent usage of any algorithm rather than over investing in one.

25. Which are the best machine learning techniques for imbalanced data?

I don’t do a special treatment here. I know people find that strange. This comes down to optimizing the right metric (for me). It is tough to explain in a few lines. There are many techniques for sampling, but I never had to use. Some people are using Smote. I don’t see value in trying to change the principal distribution of your target variable. You just end up with augmented or altered principal odds. If you really want a cut-off to decide on whether you should act or not – you may set it based on the principal odds.

I may not be the best person to answer this. I personally have never found it (significantly) useful to change the distribution of the target variable or the perception of the odds in the target variable. It may just be that other algorithms are better than others when dealing with this task (for example tree-based ones should be able to handle this).

26. Typically, marketing research problems have been mostly handled using standard regression techniques – linear and logistic regression, clustering, factor analyses, etc…My question is how useful are machine learning and deep learning techniques/algorithms useful to marketing research or business problems? For example how useful is say interpreting the output of a neural network to clients? Are there any resources you can refer to?

They are useful in the sense that you can most probably improve accuracy (in predicting let’s say marketing response) versus linear models (like regressions). Interpreting the output is hard and in my opinion it should not be necessary as we are generally moving towards more black box and complicated solutions.

As a data scientist you should put effort in making certain that you have a way to test how good your results are on some unobserved (test) data rather trying to understand why you get the type of predictions you are getting. I do think that decompressing information from complicating models is a nice topic (and valid for research), but I don’t see it as necessary.

On the other hand, companies, people, data scientists, statisticians and generally anybody who could be classified as a ‘data science player’ needs to get educated to accept black box solutions as perfectly normal. This may take a while, so it may be good to run some regressions along with any other modelling you are doing and generally try to provide explanatory graphs and summarized information to make a case for why your models perform as such.

27. How to build teams for collaboration on Kaggle ?

You can ask in forums (i.e in kaggle) . This may take a few competitions though before ’people can trust you’. Reason being, they are afraid of duplicate accounts (which violate competition rules), so people would prefer somebody who is proven to play fair. Assuming some time has passed, you just need to think of people you would like play with, people you think you can learn from and generally people who are likely to take different approaches than you so you can leverage the benefits of diversity when combining methods.

28. I have gone through basic machine learning course(theoretical) . Now I am starting up my practical journey , you just recommended to go through sci-kit learn docs & now people are saying TENSORFLOW is the next scikit learn , so should I go through scikit or TF is a good choice ?

I don’t agree with this statement ‘people are saying TENSORFLOW is the next scikit learn’. Tensorflow is a framework to do well certain machine learning tasks (like for deep learning). I think you can learn both, but I would start with scikit. I personally don’t know TensorFlow , but I use tools that are based on tensor flow (for example Keras). I am lazy I guess!

29. The main challenge that I face in any competition is cleaning the data and making it usable for prediction models. How do you overcome it ?

Yeah. I join the club! After a while you will create pipelines that could handle this relatively quicker. However…you always need to spend time here.

30. How to compute big data without having powerful machine?

You should consider tools like vowpal wabbit and online solutions, where you parse everything line by line. You need to invest more in programming though.

31. What is Feature Engineering?

In short, feature engineering can be understood as:

  • Feature transformation (e.g. converting numerical or categorical variables to other types)
  • Feature selections
  • Exploiting feature interactions (like should I combine variable A with variable B?)
  • Treating null values
  • Treating outliers

32. Which maths skills are important in machine learning?

Some basic probabilities along with linear algebra (e.g. vectors). Then some stats help too. Like averages, frequency, standard deviation etc.

33. Can you share your previous solutions?

See some with code and some without (just general approach).

34. How long should it take for you to build your first machine learning predictor ?

Depends on the problem (size, complexity, number of features). You should not worry about the time. Generally in the beginning you might spend much time on things that could be considered much easier later on. You should not worry about the time as it may be different for each person, given the programming, background or other experience.

35. Are there any knowledge competitions that you can recommend where you are not necessarily competing on the level as Kaggle but building your skills?

From here, both titanic and digit recognizer are good competitions to start. Titanic is better because it assumes a flat file. Digit recognizer is for image classification so it might be more advanced.

36. What is your opinion about using Weka and/or R vs Python for learning machine learning?

I like Weka. It has a good documentation– especially if you want to learn the algorithms. However I have to admit that it is not as efficient as some of the R and Python implementations. It has good coverage though. Weka has some good visualizations too – especially for some tree-based algorithms. I would probably suggest you to focus on R and Python at first unless your background is strictly in Java.

Summary

In short, succeeding in machine learning competition is all about learning new things, spending a lot of time training, feature engineering and validating models. Alongside, interact with community on forums, read blogs and learn from approach of fellow competitors.

Success is imminent, given that if you keep trying. Cheers!

Subscribe to The HackerEarth Blog

Get expert tips, hacks, and how-tos from the world of tech recruiting to stay on top of your hiring!

Author
Team Machine Learning
Calendar Icon
March 9, 2017
Timer Icon
3 min read
Share

Hire top tech talent with our recruitment platform

Access Free Demo
Related reads

Discover more articles

Gain insights to optimize your developer recruitment process.

What AI Is Forcing HR to Rethink About Hiring

What AI is forcing HR to rethink

For recruiters and talent leaders, AI has made one thing clear: resumes can no longer be trusted as the primary signal of candidate capability. What AI is forcing HR to rethink is the entire screening stack — from how reqs are written, to how the ATS filters applicants, to how quality of hire (QoH) is measured against time-to-fill. According to LinkedIn's Future of Recruiting 2024 report, 73% of recruiters say skills-based hiring is a priority, yet most pipelines still screen on degree and employer brand at the ATS layer. That gap is where the rethink begins.

Why traditional resumes no longer predict strong hires

Resumes measure presentation more reliably than capability. Recruiters have long used job titles, company names, degrees, and years of experience as proxies for performance, but generative AI tools — ChatGPT, Teal, Rezi, and Kickresume among them — have collapsed the cost of producing a polished application. The World Economic Forum's Future of Jobs Report 2023 found that 44% of workers' core skills are expected to change by 2027, which means a resume snapshot ages faster than the role it describes.

For recruiters, the operational impact is direct: pipelines fill, screen rates rise, and yet QoH stays flat. As AI becomes more deeply embedded in hiring, HR leaders are being forced to rethink a single question:

What if resumes are no longer the best predictor of performance?

That question is reshaping recruitment faster than many organizations expected — though, as discussed later, the shift away from resumes carries its own trade-offs.

Share of Workers' Core Skills Expected to Change by 2027
Source: World Economic Forum Future of Jobs Report 2023

The resume was built for a different era

Modern work no longer fits the resume's static format. Skills evolve in months rather than years, roles overlap across functions, and professionals build expertise through online communities, freelance projects, bootcamps, and self-directed learning. According to SHRM's 2024 Talent Trends research, nearly half of HR leaders report that candidates from non-traditional backgrounds are increasingly competitive on assessments.

Resumes still reduce people to standardized timelines, and many capable candidates are filtered out by ATS rules simply because they lack the "right" employer logos. At the same time, candidates skilled in resume optimization can outperform genuinely capable professionals at the screen stage — a pattern that pre-dates AI but has been amplified by it.

It has become far easier for candidates to generate polished resumes, cover letters, and interview responses in minutes. For recruiters, the takeaway is practical: formatting and phrasing are no longer reliable proxies for capability.

AI did not break hiring — it exposed existing problems

AI did not create the resume problem; it surfaced one already present in most hiring funnels. Surveys of recruiters, including Gartner's 2024 HR research, have consistently shown three pre-AI pressures: recruiters overwhelmed by application volume, candidates optimizing resumes to pass ATS filters, and hiring managers reporting weak outcomes despite reviewing seemingly strong resumes.

AI accelerated these problems to a point where they can no longer be ignored. Many candidates can now generate a highly optimized application in seconds, and recruiters increasingly struggle to distinguish between candidates skilled at self-presentation and those who can actually do the work.

The operational shift is moving from:

"What does your resume say?"

Toward:

"Can you actually do the job?"

The rise of skills-based hiring

Skills-based hiring outperforms resume screening because it measures demonstrated capability rather than credential proximity. A growing number of organizations — including IBM, Accenture, and Delta, profiled in LinkedIn's Skills Path program — are moving toward skills-first models that prioritize practical assessments, simulations, project work, and role-specific problem-solving over employer brand or degree.

This trend is most visible in technology hiring, where coding assessments and real-world technical evaluations generally provide stronger signals than resumes alone, particularly when compared against resume-only screens for time-to-productivity. HackerEarth has run over 100 million developer assessments across enterprise hiring programs, and the consistent pattern in that dataset is that demonstrated coding performance correlates more closely with on-the-job output than degree or prior employer.

Beyond tech, a growing number of organizations are extending the model: marketing teams using campaign-brief exercises, sales teams using recorded customer-handling scenarios, and operations teams using situational judgment tests. For a deeper view of how this maps to specific roles, see our skills-based hiring guide and developer assessment platform.

Where skills-based hiring breaks down

Skills-based hiring is not without trade-offs, and recruiters evaluating it should plan for known failure modes:

  • Assessment bias. Poorly designed assessments can disadvantage career returners, caregivers, and candidates with limited test-taking time as severely as resume screens disadvantage non-traditional backgrounds.
  • Gaming of take-home tests. Unproctored coding or case exercises are increasingly solvable with generative AI, which means assessment design has to evolve in step with candidate tooling.
  • Candidate experience at scale. Long assessment batteries lower completion rates and damage employer brand, particularly for senior candidates who have multiple offers in play.
  • Legal exposure. In jurisdictions including New York City (Local Law 144) and under the EU AI Act, automated employment decision tools are subject to bias audits and disclosure requirements. Recruiters should confirm vendor compliance before deploying AI-driven scoring.

The honest read: most organizations announcing a "shift" to skills-based hiring still filter by degree at the ATS layer. The shift is real, but it is uneven.

Skills-Based Hiring Priority vs. ATS Screening Reality
Source: LinkedIn Future of Recruiting 2024; ATS screening figure illustrative based on article claims

Why HR leaders are rethinking potential

Potential is becoming more measurable in ways resumes never allowed. Traditional hiring often prioritized pedigree — familiar universities, recognizable employers, conventional career paths — but AI-powered assessment platforms (HackerEarth, HireVue, Pymetrics, Codility, and Workday Skills Cloud among them) score candidates on demonstrated performance against role-specific tasks, calibrated to a benchmark population.

These tools typically combine task-based evaluations, behavioral simulations, and structured scoring rubrics. Their limits matter too: they score what they are trained to score, they can encode bias from the training population, and they do not measure long-arc traits like cultural contribution or leadership trajectory. Recruiters should treat them as one signal in a structured interview loop, not a single decision point.

Research suggests that candidates without elite degrees frequently match or outperform credentialed peers on standardized technical assessments. In many cases, career switchers and self-taught professionals demonstrate strong adaptability and practical skill. Organizations that shift toward capability-based evaluation may gain access to broader and more diverse talent pools — though, as noted above, only if assessment design itself is audited for fairness.

The recruiter's role is changing

AI is not replacing recruiters; it is shifting where recruiters spend their time. Traditional recruitment rewarded screening volume and speed. Modern hiring increasingly rewards judgment, stakeholder alignment, and structured decision-making.

As automation handles sourcing, scheduling, resume parsing, and initial outreach, recruiters are spending more time on work AI cannot do well:

  • Probing candidate motivation through structured behavioral interviews
  • Evaluating adaptability against specific role demands using scorecards
  • Building hiring-manager alignment on the req and intake brief
  • Designing candidate-experience touchpoints that protect offer-accept rates
  • Calibrating assessment results against on-the-job performance data

The recruiter who succeeds in an AI-heavy pipeline is the one who can interpret signal, not the one who can scan resumes faster.

Candidates are changing faster than hiring systems

Modern career paths now move faster than most ATS configurations. Today's workforce values flexibility, creativity, continuous learning, and project-based growth, and many professionals build experience through freelance work, startups, creator platforms, and side projects. Their resumes often look unconventional, but unconventional no longer equates to unqualified.

Organizations that shift toward capability-based evaluation may access talent pools that rigid resume filters would otherwise miss. For practical guidance on adjusting screening criteria, see our guide to evaluating an ATS for skills-based hiring.

The future of hiring will feel more human

There is an irony in the AI shift: as resumes become easier to automate, organizations are being pushed to evaluate creativity, adaptability, collaboration, and real-world problem-solving more directly. The likely structure of mature AI-enabled hiring is AI handling repetitive tasks — sourcing, scheduling, parsing, initial scoring — while recruiters and hiring managers focus on nuance, context, and long-term fit.

FAQ

Is skills-based hiring more effective than resume screening? Skills-based hiring tends to predict on-the-job performance more reliably than resume screening for roles where the work can be assessed directly, such as engineering, data, sales, and marketing execution. According to LinkedIn's Future of Recruiting report, 73% of recruiters now prioritize skills-based approaches. Effectiveness depends heavily on assessment design and on whether downstream ATS filters still gate candidates by degree.

What HR processes is AI changing first? AI is changing sourcing, resume parsing, candidate matching, and initial assessment scoring first, because these are high-volume, rules-based tasks. Structured interviewing, offer negotiation, and onboarding remain primarily human-led, though AI-assisted note-taking and scorecard analysis are growing.

Will AI replace recruiters? AI is unlikely to replace recruiters, but it is changing the skill profile. Recruiters who can interpret assessment data, align hiring managers, and design candidate experience will be more valuable; recruiters whose role is primarily resume scanning are most exposed.

How do I evaluate an AI hiring tool for bias? Ask the vendor for a bias audit report (required under NYC Local Law 144 for automated employment decision tools), the demographic composition of the training data, the validation methodology against job performance, and the appeal process for candidates. Avoid tools that cannot answer all four.

Is resume-based hiring going away? Resume-based hiring is under pressure but not disappearing. Most organizations are moving toward hybrid models where resumes provide context and assessments provide the capability signal. A full move away from resumes is unlikely in the next hiring cycle for most enterprises.

What is the biggest risk of switching to skills-based hiring? The biggest risk is poorly designed assessments that introduce new forms of bias or damage candidate experience. A skills-based process built on a long, unproctored, untested assessment battery will perform worse than a structured resume screen.

Next steps: See it in action

If you are a recruiter or talent leader evaluating how to move from resume-led to skills-led screening, book a demo of HackerEarth Assessments to see how role-specific evaluations, proctoring, and benchmarked scoring fit into an existing ATS pipeline. For background reading, see our developer assessment platform overview and the HackerEarth recruiter blog.

Recruiters who pair structured assessment data with strong human judgment build better pipelines than either resumes or AI alone can produce.

Must-Know Recruitment Questions for HR and Talent Acquisition Teams (2026)

Recruitment questions every HR professional should know in 2025

Estimated read time: 7 minutes

Most "tell me about yourself" answers are now written by ChatGPT the night before the interview. That single shift — candidates arriving with rehearsed, AI-polished narratives — has broken the standard interview script and forced recruiters to redesign their question sets from the ground up. This guide outlines the categories of recruitment questions every HR professional should know in 2025, why each matters, and example questions you can adapt to your hiring rubric or scorecard today.

LinkedIn's 2024 Global Talent Trends report notes that skills-based hiring and behavioral assessment have moved from optional to expected in most talent acquisition workflows. Yet many hiring conversations still rely on outdated prompts that produce polished answers and unclear signals. The recruiter persona — the one running req intake, pipeline reviews, and screen calls — needs a tighter toolkit.

Who this is for: This article is written for recruiters and talent acquisition partners running structured interviews. Hiring managers building a scorecard alongside the recruiter will also find the question categories useful.

Adoption of Structured Hiring Practices Among HR Teams (2020–2025)
Source: LinkedIn Global Talent Trends claims cited in article

Why modern recruitment questions fail when they stay outdated

Industry observers at SHRM have noted that candidates are better prepared, interviews are more structured, and expectations on both sides have risen (SHRM research). With generative AI tools widely available, many candidates now enter screens with refined, rehearsed narratives.

The result is predictable — polished answers, unclear signals, and decisions made on incomplete understanding. The quality of the recruitment questions you bring into the room directly defines the quality of the signal you capture on the scorecard.

A contestable position worth stating plainly: behavioral interview frameworks like STAR are now overused to the point where candidates have memorized the structure, which reduces signal quality unless interviewers probe past the rehearsed answer with follow-ups.

What this article won't claim

Structured behavioral interviewing is not a silver bullet. Over-indexing on adaptability can screen out deep specialists whose value is stability and depth. Ownership-mindset framing, if applied rigidly, can disadvantage neurodivergent candidates or those from cultures where collective credit is the norm. Use the questions below as part of a balanced rubric — not as a single filter.

From "tell me about yourself" to understanding real intent

Traditional opening questions rarely reveal a candidate's intent or direction. A stronger opening probes why a candidate is moving at this specific point and what kind of work keeps them engaged beyond compensation.

Evidence from Gallup's 2023 State of the Global Workplace report suggests today's workforce is increasingly motivated by alignment, learning, and perceived growth — not stability alone. If this layer is missed early in the interview, the rest of the evaluation becomes less reliable.

Example intent and motivation questions

  • "Walk me through the last time you decided to leave a role. What specifically triggered the decision?"
  • "What kind of work has made you lose track of time in the last 12 months?"
  • "If this role didn't exist, what would your second-choice next move be — and why?"
  • "What would need to be true 18 months from now for you to consider this move a success?"

What to listen for

  • Specific triggers and trade-offs, not generic phrases like "growth" or "new challenges."
  • Consistency between the stated motivation and the candidate's actual career pattern.

Red flags

  • Answers that match the job description back to you almost verbatim.
  • Vague language about "culture" or "growth" with no concrete example.

Behavioral and competency-based recruitment questions: getting past scripted answers

One of the biggest challenges recruiters face today is not lack of talent, but over-prepared talent. Hiring practitioners increasingly find that well-structured, confident answers do not always reflect real capability, especially when responses are influenced by preparation tools or rehearsed narratives.

This is why competency-based questions — which explore decision-making logic, trade-offs, and real-time reasoning — produce higher signal than story-based prompts alone. For technical roles, pairing these with a practical assessment helps confirm what the interview surfaces. HackerEarth's skill assessments use role-specific question libraries and rubric-based scoring so the recruiter can compare candidate outputs against a defined standard, rather than relying on the candidate's own narrative of their capability.

Example behavioral and competency-based questions

  1. "Tell me about a decision you made in the last six months that you would make differently today. What changed your thinking?"
  2. "Describe a time you disagreed with your manager on a priority. How did you handle it?"
  3. "Walk me through a project where the scope changed mid-execution. What did you cut, and why?"
  4. "Give me an example of feedback you initially rejected but later acted on."

How to probe past the rehearsed answer

If a candidate delivers a clean STAR-format response, follow up with: "What's one detail you usually leave out of that story?" or "Who would tell that story differently?" These prompts disrupt the rehearsed structure and surface the actual reasoning.

Situational judgment and adaptability questions

Workplaces are shaped by continuous change — shifting priorities, evolving tools, and hybrid collaboration. Many hiring teams now treat adaptability as a core hiring parameter rather than a soft skill, particularly for roles where ambiguity is the default state.

Situational judgment questions present a realistic scenario and ask the candidate how they would navigate it. They are harder to rehearse than story-based prompts because the scenario is novel.

Example situational judgment questions

  • "You join the team and discover the project you were hired to lead has already slipped two months. What are your first three actions in week one?"
  • "Two stakeholders give you conflicting priorities on the same Friday. Both are senior to you. How do you handle it?"
  • "A teammate is consistently delivering work that is technically correct but late. You are not their manager. What do you do?"
  • "You realize halfway through a quarter that the metric you committed to is no longer the right one. How do you raise it?"
  • "Your top-performing team member tells you in a 1:1 they're considering leaving. They haven't told their manager. What do you do in the next 24 hours?"
  • "A vendor misses a critical deadline that puts your launch at risk. Walk me through how you decide whether to escalate, switch vendors, or absorb the delay."

What to listen for

  • Sequencing — do they ask clarifying questions before acting?
  • Trade-off awareness — do they acknowledge what they would not do?
  • Stakeholder reasoning — who do they involve, and when?

Culture and values-alignment questions

Cultural fit is often misunderstood as shared interests or personality alignment. A more useful frame is behavioral consistency with the team's working norms.

A second contestable position: generic "culture fit" questions should be retired in favor of values-alignment scenarios that name a specific behavior the company expects. "Culture fit" as a phrase invites bias; a scenario tied to a stated company value forces a more concrete answer.

Example values-alignment questions

  • "Our team gives feedback in writing before live discussion. Describe the last time you gave hard feedback. What did you write down first?"
  • "We prioritize shipping over perfection. Tell me about a time you shipped something you weren't fully proud of. What happened next?"
  • "Describe the last time you changed your mind because of data, not opinion."

For a deeper look at how culture signals show up in technical interviews, see our guide on how to design a structured technical interview.

Identifying ownership mindset over task execution

Task completion alone is no longer a strong hiring indicator for most knowledge roles. What recruiters and hiring managers increasingly screen for is the ownership mindset — how a candidate behaves when outcomes are unclear, accountability is shared, or success metrics evolve mid-execution.

A concrete scenario

Consider a Series B SaaS company hiring its first sales operations manager. The pipeline is messy, the CRM is half-implemented, and the founder is the de-facto rev-ops owner. Standard task-execution questions ("walk me through how you'd clean a pipeline") produce textbook answers. Ownership-mindset questions — "What would you stop doing in your first 30 days, and how would you tell the founder?" — surface whether the candidate can hold the seat. A strong answer names a specific thing they'd stop (e.g., "weekly pipeline reviews in their current form"), the trade-off they're willing to accept, and how they'd frame the conversation with the founder. A weak answer lists everything they'd add — new dashboards, new processes, new tooling — without naming a single thing they'd remove or a single conversation they'd own.

Example ownership questions

  • "Tell me about something you fixed that wasn't your job to fix."
  • "Describe a time the goalposts moved on you. What did you do in the first 48 hours?"
  • "What's a process you killed, and what replaced it?"

Red flags

  • Answers that always credit "the team" with no individual decision named.
  • Stories where the candidate is consistently the rescuer or always the victim.

Questions to avoid: legal and compliance boundaries

A structured question set is only as strong as its weakest prompt. In most jurisdictions, certain questions are either illegal or carry significant legal risk because they touch protected characteristics or regulated information.

Common categories to avoid in initial screens:

  • Age, date of birth, or graduation year as a proxy for age.
  • Marital status, family planning, or childcare arrangements ("Do you plan to have kids?" "Who watches your children?").
  • Citizenship or national origin beyond the legally permitted "Are you authorized to work in [country]?"
  • Religion, religious holidays, or observance schedules.
  • Disability or medical history, including questions about prior workers' compensation claims.
  • Salary history — now restricted or banned in many US states and several other jurisdictions. Ask about salary expectations instead.

For a deeper treatment of pre-employment screening practices and compliance, see our overview of pre-employment assessment design. Always confirm specifics with your legal or HR compliance partner — local law varies.

Rethinking what "good answers" actually mean

In traditional interviews, clarity and confidence were often equated with strong performance. Modern hiring increasingly challenges this assumption.

The signal you want is depth, consistency, and reasoning quality — even when responses are less polished. A candidate who says "I don't know, but here's how I'd find out" is often a stronger hire than one who delivers a fluent answer with no underlying logic.

To codify this on the scorecard, score reasoning and presentation as separate rubric lines. A candidate can score 4/5 on reasoning and 2/5 on presentation and still be a strong hire — but you will only see that if the rubric separates them.

FAQ: structured hiring questions

Which recruitment question category is most often skipped — and why does it matter?

In practice, ownership-mindset questions are the category recruiters most often skip, because they're the hardest to score consistently and the answers don't fit neatly into STAR. The cost of skipping them is high: ownership signal is what separates strong individual contributors from people who execute well only when the path is clear. If you only have time to add one new category to your interview guide, this is the one with the largest marginal lift.

What is the STAR method, and is it still useful?

STAR stands for Situation, Task, Action, Result. It is a candidate-response framework that helps structure answers to behavioral questions. It remains useful as a default structure, but because most candidates now prepare STAR-formatted stories, interviewers should probe past the rehearsed answer with follow-up questions about trade-offs, omitted details, and alternative perspectives.

How many interview question frameworks should a structured interview include?

Practitioners commonly recommend 5–8 core questions per 45-minute round, with planned follow-up probes. This is a rule of thumb rather than a sourced standard. Fewer questions with deeper probes typically produce more signal than many surface-level questions.

What is the difference between behavioral and situational judgment questions?

Behavioral questions ask about past actions ("Tell me about a time you…"). Situational judgment questions ask about hypothetical scenarios ("What would you do if…"). Behavioral questions test verified history; situational questions test reasoning on novel problems. Strong interview loops use both.

How do you reduce bias in recruitment questions?

Use a structured interview where every candidate is asked the same core questions, score answers on a defined rubric, and have at least two interviewers calibrate independently before discussing. Avoid "culture fit" as a freeform judgment; replace it with values-alignment scenarios tied to documented company behaviors.

Can skill assessments replace interview questions?

No. Assessments and interview questions answer different things. Assessments produce structured skill evaluation against a defined rubric; interview questions surface reasoning, motivation, and judgment. The strongest hiring loops pair both — skill assessments for verified capability, structured behavioral interviews for everything assessments can't measure.

Final thoughts and next steps

The recruitment questions every HR professional should know in 2025 are not a fixed list — they are a working toolkit you adapt to the role, the level, and the rubric. The categories above (intent, behavioral, situational, values-alignment, ownership) give you a structure; the example questions give you a starting point.

Next steps

  • Audit your current interview guide. Map every question to one of the five categories above. If a category is empty, add two questions.
  • Separate reasoning from presentation on your scorecard. Score them as distinct rubric lines.
  • Pair interviews with skill verification. Schedule a demo of HackerEarth Assessments to see how rubric-based skill scores integrate with your interview scorecard, so your hiring decision isn't relying on candidate self-report alone.

Sources referenced: LinkedIn Global Talent Trends, SHRM Research, Gallup State of the Global Workplace.

Why Empathy Could Be Your Biggest Hiring Advantage

Why Empathy Could Be Your Biggest Hiring Advantage

Why Human-Centered Hiring Matters More Than Ever

Hiring has never been more optimized than it is today.

From AI-powered recruitment tools to automated screening systems and structured interview workflows, HR and talent acquisition teams now have more ways than ever to improve hiring speed, consistency, and scalability.

But in the middle of this efficiency-driven approach, one critical element is slowly disappearing: employee empathy.

Empathy in hiring is not about slowing down recruitment or making decisions less objective. It is about ensuring candidates are treated like people navigating important career decisions, not just profiles moving through a hiring pipeline.

As recruitment becomes increasingly system-driven, preserving the human side of hiring is becoming both more difficult and more important.

For HR leaders and talent acquisition professionals, this is no longer just a workplace culture discussion. It directly impacts candidate experience, employer branding, hiring quality, and long-term employee retention.

When Hiring Feels Like a Process Instead of an Experience

Most modern recruitment systems are designed around efficiency.

Applications are filtered automatically, interviews are scheduled faster, and candidates move through hiring stages with minimal manual effort. Operationally, this creates speed and structure.

But from a candidate’s perspective, the experience can often feel distant and impersonal.

Many candidates go through multiple interview rounds without clear communication, feedback, or transparency about timelines and expectations. Even when the hiring process is fair, it may still feel mechanical.

This creates a growing challenge for HR and TA teams:

How do you maintain hiring efficiency without removing the human connection from recruitment?

That is where empathy becomes essential.

The Hidden Cost of Low-Empathy Hiring

The impact of low-empathy hiring is not always immediate, but it compounds over time.

Candidates remember how organizations made them feel during the recruitment process, especially during rejection or delayed communication. Those experiences shape employer perception long before someone becomes an employee.

Over time, this directly affects employer brand and candidate trust.

There is also another hidden cost.

When hiring becomes too rigid or overly process-driven, recruiters may overlook candidates with strong long-term potential simply because they do not perfectly match predefined criteria.

Without empathy, context disappears.

And when context disappears, opportunities are often missed.

For HR leaders, empathy is no longer just a soft skill. It is becoming a competitive hiring advantage.

Why Empathy Is Becoming a Competitive Hiring Skill

Today’s workforce is far more dynamic than it was a decade ago.

Professionals switch industries, build careers through unconventional paths, and learn skills outside traditional education systems. As a result, resumes and structured evaluations only tell part of the story.

Empathy helps recruiters understand what exists beyond the surface.

It allows hiring teams to better understand:

  • Career transitions
  • Employment gaps
  • Nontraditional experience
  • Personal growth journeys

This shift changes the entire hiring mindset.

Instead of asking:

“Does this candidate perfectly match the role?”

Recruiters are increasingly asking:

“What could this candidate become in the right environment?”

That perspective creates stronger and more future-focused hiring decisions.

Where Empathy Fits in Modern Recruitment

Empathy does not replace structured hiring systems.

In fact, it becomes most effective when built into them.

Simple improvements in communication can significantly improve candidate experience. Clear updates, transparent timelines, respectful rejection emails, and honest feedback all contribute to a more human-centered recruitment process.

These small changes often have a lasting impact on how candidates perceive an organization.

For HR teams, the goal is not to remove structure from hiring.

The goal is to ensure structure does not remove humanity.

Better Hiring Decisions Start With Better Human Understanding

Empathy also improves the quality of hiring decisions themselves.

When recruiters take time to understand a candidate’s context, they often uncover strengths that are not immediately visible on resumes or scorecards.

A candidate who appears average on paper may demonstrate exceptional adaptability, resilience, or problem-solving ability in real-world situations.

Without empathy, those signals are easy to miss.

For talent acquisition leaders, this means recognizing that hiring is not just about selecting the strongest profile.

It is about identifying the strongest long-term fit within a real human context.

Final Thoughts

As recruitment continues evolving through automation, AI hiring tools, and structured decision-making, the biggest risk is not losing efficiency.

It is losing humanity.

Employee empathy ensures hiring remains people-focused, even as processes become more technology-driven.

It does not slow recruitment down. Instead, it helps organizations create better candidate experiences, stronger employer brands, and more thoughtful hiring decisions.

Because candidates may forget interview questions or assessment scores.

But they will always remember how they were treated during the hiring process.

And in today’s competitive talent market, that experience often determines whether top talent chooses to join or walk away.

Top Products

Explore HackerEarth’s top products for Hiring & Innovation

Discover powerful tools designed to streamline hiring, assess talent efficiently, and run seamless hackathons. Explore HackerEarth’s top products that help businesses innovate and grow.
Frame
Hackathons
Engage global developers through innovation
Arrow
Frame 2
Assessments
AI-driven advanced coding assessments
Arrow
Frame 3
FaceCode
Real-time code editor for effective coding interviews
Arrow
Frame 4
L & D
Tailored learning paths for continuous assessments
Arrow
Get A Free Demo