Cambridge College Challenges of Data Preparation Discussion Identify and discuss what you believe, given what you learned from the assigned readings and your general knowledge and experience, to be the three (3) most commonly seen data preparation challenges. plz use these material to write a answer. September 2014
TDWI E-Book
Data Quality Challenges
and Priorities
1
Q&A: Addressing Todays Top Data Quality Issues
4
Top 10 Priorities for Data Quality Solutions
6
Engaging and Empowering Business Users to
Improve Data Quality
9
About SAS
Sponsored by:
tdwi.org
Expert Q&A
Top 10 Priorities for DQ
Engaging Business Users
About SAS
Addressing Todays Top
Data Quality Issues
Maintaining data quality has always been a top issue for
enterprises, but with changing data needs and business
environmentsincluding big data, unstructured data, and
data governanceits never been more challenging. We
look at the top issues that enterprises are asking about
data quality with Anne Buff, business solutions manager and
thought leader for SAS Best Practices.
TDWI: How are industry leaders using data quality to advance
business strategy?
Anne Buff: Organizations that design their data management
strategy within the context of overarching corporate initiatives
are leading their industries, often with large gaps. While there are
many great data quality best practices we can learn from these
companies, they often share three common elements in their
approach:
Designed process. Data quality does not have a one-size-fits-all
templatenot even within an organization. Designing data quality
rules, policies, and procedures around the needs and culture of the
business is essential for buy-in and long-term support from the
organization.
Business metrics. Metrics-based measurement is an understood
management success factor. When it comes to successful data
management, though, it is imperative that metrics are business
based, not technology based. Data management metrics should have
specific, measurable business outcomes and articulate value in at
least one of the following areas: increased productivity/efficiency,
regulatory compliance, reduced cost/complexity, and decreased risk.
Simply put, executives listen when programs make money, save
money, or keep them out of jail.
1??TDWI e – book Data Q ua lit y Ch a llenges a nd Priorities
Expert Q&A
Top 10 Priorities for DQ
Enterprise view. Although the scope of management matters when
governing data, organizations that maintain or are working toward
a holistic view of enterprise data rather than maintaining individual
data silos are making far greater strides in advancing business
strategy. The streamlined, cross-functional capabilities gained from
the comprehensive view are fundamental for faster innovation,
growth, and development.
Does data quality require data stewardship and data
governance?
Data quality initiatives can be successful without data stewardship
or data governance, but when completed as ad hoc tasks or projects,
they often consume significant resources and time. Data quality
programs are most efficient and effective when implemented
in a structured, governed environment. Data governance is the
business-driven policy making and oversight of corporate data;
data management, which includes data stewardship, is the tactical
execution of such policies (Dyché, 2010).
Clearly defining roles and outlining the authority, accountability, and
responsibility for decisions regarding enterprise data assets provides
the necessary framework for resolving conflicts and driving the
business forward as the data-driven organization matures.
Consider defining such roles as data stewards, data custodians,
subject matter experts, business stakeholders, the data governance
council, and executive sponsors/advisors.
As organizations begin to bring big data into their environments,
a common question is: What do we need to add to our data
governance program now that we have big data? The answer is:
nothing. Big data is still datathe rules of the game dont change.
Big data projects will operate just fine under your existing data
governance framework. Not all of the components of the framework
will apply to all big data projects. Thats okay, just as long as the
projects dont run outside the established framework.
When considering data access and availability, is real time
realistic?
The need for and definition of real time varies across industries
and organization size. Although having access to the most current
and accurate data is a reasonable, justifiable expectation (that
can require heroic efforts in and of itself in some organizations),
real-time access is generally not necessary. There are, of course,
use cases in some industries that have little to no tolerance for
data latency, such as sensors in life-saving medical devices, data
feeds in stock trading, or air traffic control data. Because of the
significant investments required to provide and support real-time
Engaging Business Users
About SAS
data, many organizations have weighed business needs against the
costs and determined that just-in-time is fast enough.
This will not remain the prevailing answer for long. With the
evolution, maturity, and broader adoption of cloud and big data
technologies, the expectation of real-time access and availability is
increasing rapidly. Realistic or not, organizations must consider new
tools and technology solutions to meet these expectations with a
very limited budget and resources.
Although business needs and definitions of real time vary across
industries, the technology solutions and capabilities to provide and
support real time are the same regardless of business or industry.
Technologies to explore include event stream processing, data
virtualization, in-database embedded processing, cloud computing,
and open source big data technologies.
With the evolution, maturity, and broader
adoption of cloud and big data technologies,
the expectation of real-time access and
availability is increasing rapidly.
What is the greatest impact big data will have on the enterprise
data environment?
Whether organizations have big data or not, the attention that big
data is receiving in mainstream media and across all industries has
a powerful direct impact on how they approach and manage data.
Executives have tuned in to the big data story and are ready to
support enterprise data initiatives and drive organizational change
to become data driven. Based on what they have seen and heard,
more data means more opportunity, more innovation, more revenue,
and better customer experiencesthe list of magic that more data
brings to the business is ever-growing.
The newfound excitement and support for data is the good news and
the bad news. You cant do big data for the sake of the coolness of
big data. Although the emerging big data technologies are without
a doubt exciting and attractive because of all the possibilities they
generate, implementing solutions without a business purpose is
doomed to failure. Harnessing the technical eager beavers will be
a difficult but necessary challenge. Remember, the organizational
strategy for managing data, regardless of size, is a business
issue. Successful organizations design, manage, and govern their
enterprise data programs based on business needs and initiatives.
2??TDWI e – book Data Q ua lit y Ch a llenges a nd Priorities
Expert Q&A
Top 10 Priorities for DQ
Engaging Business Users
About SAS
How will data quality initiatives evolve as organizations add big
data to their enterprise environments?
What is the major differentiator between leaders and laggards
in regard to data quality and management?
Many early adopters sought to redefine data quality initiatives
based on the size or type of data (structured, unstructured, etc.)
as they introduced big data to their environments. This approach
did not prove to be successful because the business needs had not
changed. In the end, big data was still data. The business rules and
requirements were still necessary and applicable.
Leaders consistently treat data as a corporate asset to drive
business value. They are keenly aware of the costs and risks
that low-quality, incomplete, and inaccurate data present. They
understand the implications of not delivering timely, relevant data to
the business. In these organizations, executives make available all of
the dedicated resources, funding, and technology needed to support
a successful enterprise data environment.
The evolution organizations will see for data quality initiatives as
they integrate big data will not be based on the size of the data but
rather on context of use. Business rules and quality requirements
differ based on the intended use of data.
A data management trend that big data brings to the table is the
concept of data lakes (or other large data containment bodies) to
hold enormous amounts of unmanaged data. The store-everything
approach is not the unique piece of the trend but rather the
concept of manage at consumption that it brings. Organizations
want to take advantage of the significantly lower data storage
costs of big data technologies, but applying the requisite policies,
standardizations, and transformations to support all business needs
to such large data volumes becomes implausible.
These organizations have developed their data management
strategies by understanding the needs of the business. Although the
business drives how they manage data, they do not get bogged down
in whether the business or IT owns data. Instead, business and IT
are strategically aligned to support data initiatives as a united front
across the enterprise.
References
Dyché, Jill [2010]. Data Governance Next Practices:
The 5 + 2 Model, BeyeNETWORK, December 9.
http://www.b-eye-network.com/view/14782
To meet the needs of the business and capitalize on the significant
data storage cost savings, organizations are starting to employ latebinding processes that apply the data management rules, processes,
and policies at the time data is requested within the context of the
request.
Should organizations manage and govern all data equally?
The type of data does not determine whether all data should be
governed and managed equallyscope doesand the answer
is no, organizations should not manage and govern data equally.
Management and governance needs will vary as the scope changes.
All defined processes, policies, and procedures should comply and
adhere to the overarching enterprise data governance program.
As the scope narrows from the enterprise level to the business
unit, department, and even down to a specific project, the rules
and requirements will become more specific. It is critical to apply
governance with appropriate scope because the degree to which an
organization can use data strategically is the degree to which data
is effectively governed.
3??TDWI e – book Data Q ua lit y Ch a llenges a nd Priorities
Expert Q&A
Top 10 Priorities for DQ
Engaging Business Users
About SAS
Top 10 Priorities
for Data Quality
Solutions
By Philip Russom, TDWI Research
The 10 priorities listed here provide an inventory of techniques,
team structures, tool types, methods, mindsets, and other
characteristics that are desirable for a fully modern, nextgeneration data quality (DQ) solution. Few organizations
will need or want to embrace all 10 priorities; you should
pick and choose according to your organizations business
and technology requirements. My intent is to help user
organizations prioritize and plan their next-generation data
quality program or solution.
Priority #1: Broader Scope for Data Quality
We say data quality as if its a single, solid monolith. In reality, DQ
is a family of eight or more related techniques. Data standardization
is the most commonly used technique, followed by verification,
validation, monitoring, profiling, matching, and so on. TDWI regularly
encounters user organizations that apply just one technique,
sometimes to just one data set or one data domain. Most DQ
solutions need to expand into more DQ techniques, data sets, and
data domains.
Priority #2: Real-Time Data Quality
According to a TDWI survey, real-time data quality (RTDQ) is the
second-fastest-growing data management discipline, after master
data management (MDM) and just before real-time data integration.
Make RTDQ a high priority so data can be cleansed and standardized
as its created or updated.
Priority #3: Data Quality Services
DQ techniques need to be generalized so they are available as
services that can be called from a wide range of tools, applications,
databases, and business processes. Data quality services enable
greater interoperability among tools and modern application
architectures as well as reuse and consistency in DQ solutions.
4??TDWI e – book Data Q ua lit y Ch a llenges a nd Priorities
Expert Q&A
Top 10 Priorities for DQ
Priority #4: Coordination with Other Data Management
Disciplines
DQ functions are beneficial to related data management disciplines.
For example, DQ functions should be applied to the reference data
managed by an MDM solution, and data integration solutions
invariably uncover DQ problems and opportunities.
Priority #5: Data Stewardship and Governance
Instead of re-inventing the wheel, user organizations can borrow
some of the organizational structures and processes of DQs
stewardship and apply them to data governance. This minimizes the
risks and decreases the time-to-use of data governance. Likewise,
there are stewardship capabilities built into many DQ tools that can
help document, automate, and scale up data governance processes.
Priority #6: Non-traditional Data Types
New types and sources of data are coming from many directions,
and all need a DQ strategy. As data is deduced and extracted from
Web data, multi-structured data, and social media, it should be
subject to DQ functions and quality metrics, as with all data.
Priority #7: Internationalization
This is second-, third-, or later-generation priority for most DQ
solutions. Prepare for it by selecting vendor tools that support
internationalization functions for national postal standards, Unicode
pages, and DQ tool GUI localization.
Priority #8: Value-Add Process
Techniques such as standardization and data append add value by
repurposing and augmenting data, respectively. Deduplication adds
value to data by reducing its redundancies. Data profiling reveals
opportunities for more value-adding actions by DQ techniques. Focus
on the value-add process to ensure the continuous improvement
expected of a DQ program.
Engaging Business Users
About SAS
Priority #10: Vendor Tools
Many first-generation DQ solutions are homegrown and handcoded. For example, standardization is the most commonly used
DQ technique, and (at the low end) standardization can be handcoded in SQL or developed using a tool for extract, transform, and
load (ETL). Hand-coded DQ solutions can prove the usefulness of
software automation for DQ, but you should anticipate life cycle
stages that demand functionality that very few organizations can
build themselves, such as identity resolution, probabilistic matching,
internationalization, real-time operation, DQ services, and hub-based
architecture.
For a more detailed discussion, read the article Ten Goals for
Next-Generation Data Quality in TDWIs What Works: Case Studies
and Solutions, Volume 33. TDWI members can access the magazine
at tdwi.org/whitepapers/2012/05/what-works-volume-33/
Philip Russom is director of TDWI Research for data management
and oversees many of TDWIs research-oriented publications,
services, and events. He is a well-known figure in data warehousing
and business intelligence, having published over 500 research
reports, magazine articles, opinion columns, speeches, Webinars,
and more. Before joining TDWI in 2005, Russom was an industry
analyst covering BI at Forrester Research and Giga Information
Group. He also ran his own business as an independent industry
analyst and BI consultant and was a contributing editor with
leading IT magazines. Before that, Russom worked in technical and
marketing positions for various database vendors. You can reach
him at prussom@tdwi.org, @prussom on Twitter, and on LinkedIn at
linkedin.com/in/philiprussom.
Priority #9: Deeper Profiling
Data profiling is too often shallow, just generating simple statistics
for values found in a single database, table, or column. It should be
broadened to enable more profound discoveries within data. Profile
data repeatedly as a kind of monitoring that tests whether datas
quality is truly improving.
5??TDWI e – book Data Q ua lit y Ch a llenges a nd Priorities
Expert Q&A
Top 10 Priorities for DQ
Engaging Business Users
About SAS
Engaging and empowering
Business Users to Improve
Data Quality
Who owns the data has much to do with who is responsible
for its quality. Heres how IT and business users can share
responsibility.
Although IT owns the tactical execution of how [a company]
manage[s] data[e.g.,] what tools do we use to manage the data
and what architectural strategy do we use to manage data?it is
critical that ITs priorities are aligned with the business drivers of
the organization, too.
Who owns the data, really: business or IT? Its a question thats
provoked no end of discussion and dissension between the line
of business and its IT custodians. Thanks to a combination of
technological, economic, and cultural factors, its also a mostly moot
question.
Its in this sense, Magne suggests, that the business can be said
to own the data. Put differently, data must be managed in a
way thats transparent or intelligible to the business. The business
owns the data to the extent that it sets priorities, provides a
reference for alignment, andin the form of data stewards and
other IT-to-business liaisonsworks with IT to see that this is
the case.
The simple fact of the matter is that both business and IT own the
data; the reach, rights, and responsibilities of both groups can and
should be neatly demarcated; andgoing forwardwrangling
about ownership will prove to be divisive, distractive, and ultimately
destructive.
This isnt weak-tea pragmatism, insists Matthew Magne, global
product marketing manager for data management (DM) with SAS.
A concept of what might be called shared ownership, based on
the insight that ITs data management policies (to say nothing of its
portfolio of DM tools and services) can and should be aligned with
the needs of the business, is the new normal.
Its actually very important that we align the creation of data
and [the] management of data across its life cycle with business
drivers, Magne acknowledges.
Its no longer a question of IT implementing these business
rules the way it sees fit, on its own terms, [albeit] in a way thats
consistent with policy or regulatory requirements, Magne continues.
Its now [a question of] proactively tracking business rules in
order to try to get ahead of challenges. Before marketing launches
a massive direct-mail campaign and spends 20 percent more than
it should because its address data is riddled with data quality
problems, were able to measure and detect those [issues] so we can
alert the IT team thats responsible for fixing [the data].
This is a specific example, but it gets at the kind of co-ownership
experience Magne has in mind. In the old model, data quality was
treated as something that somehow belonged to datathat is, as
an inherent characteristic or property of that data, irrespective of
how that data was used or what it was used for. In the new model,
6??TDWI e – book Data Q ua lit y Ch a llenges a nd Priorities
Expert Q&A
Top 10 Priorities for DQ
quality is contextual: its a function of how data gets used in the
context of particular business processes or by different business
domains. This last is actually aligned with the process by which data
quality problems typically get redressed, at least in practice. In an
ideal world, all data would be consistent and standardized; in the
…
Purchase answer to see full
attachment
Why Work with Us
Top Quality and Well-Researched Papers
We always make sure that writers follow all your instructions precisely. You can choose your academic level: high school, college/university or professional, and we will assign a writer who has a respective degree.
Professional and Experienced Academic Writers
We have a team of professional writers with experience in academic and business writing. Many are native speakers and able to perform any task for which you need help.
Free Unlimited Revisions
If you think we missed something, send your order for a free revision. You have 10 days to submit the order for review after you have received the final document. You can do this yourself after logging into your personal account or by contacting our support.
Prompt Delivery and 100% Money-Back-Guarantee
All papers are always delivered on time. In case we need more time to master your paper, we may contact you regarding the deadline extension. In case you cannot provide us with more time, a 100% refund is guaranteed.
Original & Confidential
We use several writing tools checks to ensure that all documents you receive are free from plagiarism. Our editors carefully review all quotations in the text. We also promise maximum confidentiality in all of our services.
24/7 Customer Support
Our support agents are available 24 hours a day 7 days a week and committed to providing you with the best customer experience. Get in touch whenever you need any assistance.
Try it now!
How it works?
Follow these simple steps to get your paper done
Place your order
Fill in the order form and provide all details of your assignment.
Proceed with the payment
Choose the payment system that suits you most.
Receive the final file
Once your paper is ready, we will email it to you.
Our Services
No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.
Essays
No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.
Admissions
Admission Essays & Business Writing Help
An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.
Reviews
Editing Support
Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.
Reviews
Revision Support
If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.