For an initial implementation of machine learning, roughly 60 to 80 percent of the time is spent managing data. Data is a vital input to the algorithm but can be fraught with problems. The project environment is in a similar situation. The data needs to be structured and accessible. Does your organization have a data strategy for project data in preparation for utilizing AI-based software?
Data Wrangling
Structured data is maintained in a standard format with clear data definitions and is easily accessible. The requirement to achieve this is known as data wrangling. There are differing perspectives on the steps to take. However, it starts with identifying the data available. Project data is located in various areas, such as a scope document, project schedule, budget, and risk management plan. The formats are different, so accessing the data is a challenge. The next step is to clean the data. For those involved in a data migration project, there are numerous possibilities to create messy data.
Table. Examples of Problems in Data Fields
Problem |
Example |
Data entry errors |
Product A Product a |
Data meaning |
Location |
Raw data and derived data |
Raw: 4, 10, 22 Average: 12 |
Common format |
dd/mm/yy, mm/dd/yy, mm/yy |
Blank data fields |
3, 0, 5, , 6, 8, 12 |
Data elements per field |
Owner Name: Ramesh Owner Name: Marie, Sanjay, Alex |
Duplicate data |
Product A certified June 3 Product A certified June 3 |
Once the data is clean, there should be some judgment if additional data is required. For project management, the status report might fail to include whether a resolution was successful or a risk response was effective. Project managers resolve issues but may not document the results in a format that can be captured as data.
Feature Engineering
It is usually insufficient to simply access data and successfully provide that input to a machine learning algorithm. The data needs to be modified. For example, two data fields might have the same meaning, so only one is selected. Three data fields might contain data, but taking an average for each entry provides a reasonable solution rather than overinfluencing the result simply by having three data fields. Feature engineering identifies missing data that is crucial to include or eliminates a data field that has no causal correlation.
The good news for project managers is that data scientists are less likely to have the ability to understand project data than project managers. As project managers, we know project processes and terminology. The data decisions are more appropriate, assuming the project manager has a basic level of training about how to manage data. Data scientists are in high demand and command high compensation. By performing a portion of the functions of a data scientist, project managers can dramatically increase their value to the organization.