Post by account_disabled on Mar 11, 2024 5:03:58 GMT
Precision is the number of correct positives the model claims compared to the number of positives claimed. Also called positive predictive value. Recall is the number of positives declared compared to the number of positives found in the data . Also known as true positive rate. 31. Explain the balance between variance and bias. Variance error occurs when the learning algorithm is too complex, leading your model to overfit the data. Bias error occurs when the learning algorithm has oversimplified assumptions. This creates the opposite problem of variance error. Bias error could cause generalization of knowledge from training to test set and underfitting of the model to the data. This would lead to a model that cannot have high predictive accuracy. Your candidate must demonstrate that they understand that it is never a good idea to have a model with high variance or high bias. There has to be a balance between both. 32. What are some of your favorite APIs to explore? This question assesses whether your candidate has worked with external data sources. If they have, they probably have some preferred APIs.
The best candidates will tell you what they think about certain APIs and give details of the workflows and experiments they have performed. 33. Explain how XML compares to CSV files in terms of size. This question tests Bahamas Mobile Number List whether your candidate is able to handle data processing in messy formats. XML takes up much more space than CSV. XML uses tags to lay out key-value pairs in a tree form. CSVs use separators to create categories of data and organize them into columns. Typically, an engineer will want to process the XML data into a usable CSV. 34. If you were given an unbalanced data set, how would you handle it? This question tests your candidate's understanding of the harm that unbalanced data sets can cause. Candidates must show how they would balance this harm. They can use various tactics, such as resampling the data set, collecting more data, or trying a different algorithm. 35. What do you think about the GPT-3 model? This is another question that assesses whether your candidate follows the latest trends and developments in machine learning.
Developed by OpenAI, GPT-3 is a new language generation model that can generate what look like human-level conversational pieces as large as novel-sized works, as well as create code from natural language. If your candidates are passionate about machine learning, chances are they'll have a lot to say about GPT-3. 36. What are your opinions on how Google is training data for self-driving cars? Here, you are testing your candidate's understanding of different machine learning methods. Google currently uses Recaptcha to find data tagged on road signs and storefronts. 37. How would you build a data pipeline? This should be common knowledge for machine learning engineers. Your candidate should be familiar with data pipeline building tools, such as Apache Airflow. You should also have a deep understanding of where to host models and pipelines, such as AWS, Azure, Google Cloud, etc. You want your candidate to walk you through their hands-on experience building and scaling a functional data pipeline. 38. List some data visualization libraries you have used.
The best candidates will tell you what they think about certain APIs and give details of the workflows and experiments they have performed. 33. Explain how XML compares to CSV files in terms of size. This question tests Bahamas Mobile Number List whether your candidate is able to handle data processing in messy formats. XML takes up much more space than CSV. XML uses tags to lay out key-value pairs in a tree form. CSVs use separators to create categories of data and organize them into columns. Typically, an engineer will want to process the XML data into a usable CSV. 34. If you were given an unbalanced data set, how would you handle it? This question tests your candidate's understanding of the harm that unbalanced data sets can cause. Candidates must show how they would balance this harm. They can use various tactics, such as resampling the data set, collecting more data, or trying a different algorithm. 35. What do you think about the GPT-3 model? This is another question that assesses whether your candidate follows the latest trends and developments in machine learning.
Developed by OpenAI, GPT-3 is a new language generation model that can generate what look like human-level conversational pieces as large as novel-sized works, as well as create code from natural language. If your candidates are passionate about machine learning, chances are they'll have a lot to say about GPT-3. 36. What are your opinions on how Google is training data for self-driving cars? Here, you are testing your candidate's understanding of different machine learning methods. Google currently uses Recaptcha to find data tagged on road signs and storefronts. 37. How would you build a data pipeline? This should be common knowledge for machine learning engineers. Your candidate should be familiar with data pipeline building tools, such as Apache Airflow. You should also have a deep understanding of where to host models and pipelines, such as AWS, Azure, Google Cloud, etc. You want your candidate to walk you through their hands-on experience building and scaling a functional data pipeline. 38. List some data visualization libraries you have used.