Cleaning and Profiling Code
Use only Hadoop MapReduce in this part of your project.
Do not use anything else.
You must write and submit 2 separate MapReduce jobs:
MR Job 1.
Data profiling – to explore your data
– Name the files: CountRecs.java, CountRecsMapper.java, CountRecsReducer.java
(Please use these exact names for your classes)
– This MR job counts the number of records in a dataset
– Run it on the original dataset, before cleaning, and output the number of records
– Run it on the cleaned dataset (result of MR Job 2 described below), output number of records – If the number of records don’t match, you should figure out why that is
– Re-submit a schema if it has changed.
MR Job 2.
Data cleaning – to avoid nasty exceptions later on in your analytic
– Name the files: Clean.java, CleanMapper.java, CleanReducer.java
(Please use these exact names for your classes)
– This MR job cleans the data – for example, by dropping columns you don’t need.
– It should write out a new file with only the columns you will use in your analytic.
– The selected columns for your data schema
FOR FULL CREDIT, PROVIDE THE CLASSES FOR EACH JOB
Science is the pursuit and application of knowledge and understanding of the natural and social…
Clearly stating the definition, the values, the meaning of such values and the type of…
All answered must be typed using Times New Roman (size 12, double-spaced) font. No pictures…
All answered must be typed using Times New Roman (size 12, double-spaced) font. No pictures…
https://www.npr.org/sections/ed/2018/04/25/605092520/high-paying-trade-jobs-sit-empty-while-high-school-grads-line-up-for-university Click on the link above. Read the entire link and answer the questions below…
All answered must be typed using Times New Roman (size 12, double-spaced) font. No pictures…