Data Scientist

Work Role ID: 423  |  Workforce Element: AI / Data

What does this work role do? Uncovers and explains actionable insights from data by combining scientific method, math and statistics, specialized programming, advanced analytics, AI, and storytelling.

CORE KSATs
KSAT ID Description KSAT
22 * Knowledge of computer networking concepts and protocols, and network security methodologies. Knowledge
108 * Knowledge of risk management processes (e.g., methods for assessing and mitigating risk). Knowledge
1157 * Knowledge of national and international laws, regulations, policies, and ethics as they relate to cybersecurity. Knowledge
1158 * Knowledge of cybersecurity principles. Knowledge
1159 * Knowledge of cyber threats and vulnerabilities. Knowledge
6900 * Knowledge of specific operational impacts of cybersecurity lapses. Knowledge
6935 * Knowledge of cloud computing service models Software as a Service (SaaS), Infrastructure as a Service (IaaS), and Platform as a Service (PaaS). Knowledge
6938 * Knowledge of cloud computing deployment models in private, public, and hybrid environment and the difference between on-premises and off-premises environments. Knowledge
ADDITIONAL KSATs
KSAT ID Description KSAT
21A Knowledge of statistical/machine learning algorithms. Knowledge
35 Knowledge of digital rights management. Knowledge
75A Knowledge of mathematics, including logarithms, trigonometry, linear algebra, calculus, statistics, and operational analysis. Knowledge
102 Knowledge of programming language structures and logic. Knowledge
166 Skill in conducting queries and developing algorithms to analyze data structures. Skill
172 Skill in creating and utilizing mathematical or statistical models. Skill
506 Design, develop, and modify software systems, using scientific analysis and mathematical models to predict and measure outcome and consequences of design. Task
942 Knowledge of the organization’s core business/mission processes. Knowledge
1034A Knowledge of Personally Identifiable Information (PII) data security standards. Knowledge
1034C Knowledge of Personal Health Information (PHI) data security standards. Knowledge
1120 Ability to interpret and incorporate data from multiple tool sources. Ability
3080 Ability to use and understand complex mathematical concepts (e.g., discrete math). Ability
3756 Skill in developing or recommending analytic approaches or solutions to problems and situations for which information is incomplete or for which no precedent exists. Skill
5030 Analyze data sources to provide actionable recommendations. Task
5120 Conduct hypothesis testing using statistical processes. Task
5550 Program custom algorithms. Task
5640 Utilize technical documentation or resources to implement a new mathematical, data science, or computer science method. Task
5853 Build predictive, prescriptive, or descriptive models in collaboration with stakeholders. Task
5854 Collaborate with appropriate personnel to address Personal Health Information (PHI), Personally Identifiable Information (PII), and other data privacy and data resusability concerns for AI solutions. Task
5884 Evaluate energy implications (graphical processing unit, tensor processing unit, etc.) when designing AI solutions. Task
5906 Plan and conduct complex analytical, mathematical, and statistical research that informs operational requirements. Task
5907 Plan, coordinate, and execute complex studies using advanced data modeling techniques and procedures, data trend analysis, and data algorithms. Task
5924 Train and evaluate machine learning models. Task
5927 Write and document reproducible code. Task
6050 Ability to build complex data structures and high-level programming languages. Ability
6060 Ability to collect, verify, and validate test data. Ability
6120 Ability to dissect a problem and examine the interrelationships between data that may appear unrelated. Ability
6290 Knowledge of how to leverage government research and development centers, think tanks, academic research, and industry systems. Knowledge
6490 Skill in assessing the predictive power and subsequent generalizability of a model. Skill
6570 Skill in identifying hidden patterns or relationships. Skill
6651 Skill in Regression Analysis (e.g., Hierarchical Stepwise, Generalized Linear Model, Ordinary Least Squares, Tree-Based Methods, Logistic). Skill
6750 Skill in using outlier identification and removal techniques. Skill
6760 Skill in writing scripts using R, Python, PIG, HIVE, SQL, etc. Skill
6790A Utilize open source languages, as appropriate, and apply quantitative techniques (e.g., descriptive and inferential statistics, sampling, experimental design, parametric and non-parametric tests of difference, ordinary least squares regression, general line). Task
7002 Assist integrated project teams identify, curate, and manage test data. Task
7029 Knowledge of how to collect, store, and monitor data. Knowledge
7036 Knowledge of laws, regulations, and policies related to AI, data security/privacy, and use of publicly procured data for government. Knowledge
7071 Skill in labeling data to make it more discoverable and understandable. Skill
7078 Skill in using deep learning approaches to build machine learning models. Skill