#
SAPCompliance
#
SAPsecurity
Clean master data is the bedrock of every SAP powered process. When the same customer, supplier or product is stored two (or two hundred) times under slightly different keys, it ripples through finance, logistics, analytics and compliance. Companies report that de duplication projects routinely uncover 5 – 15 % redundant master records, affecting everything from on time delivery KPIs to credit exposure and ESG reporting. SAP therefore ships “duplicate check” services in S/4HANA and SAP Business ByDesign, yet many organisations still carry legacy duplicates created years before those safeguards were activated.
Below are the objects most frequently hit in audits and brown field S/4HANA migrations:
Here is a summary and examples of the problems you might encounter when you have duplicates in your master data:
Detection strategies depend on which attributes you can safely compare:
Key takeaway: use the most discriminative field available, then enrich missing pieces (e.g. call company registers or address validation APIs) before applying fuzzy logic.
Legal affixes balloon edit distance. “ACME Shipping Inc.” (length = 17) vs “ACME Shipping LLC” differs in the last three letters but Levenshtein = 3 (> 2) ⇒ missed under a tight threshold. Solution:
This yields high recall without blowing up the candidate set.
Tip: if you need more control (e.g. ISO alpha transliteration, country specific rules), export to SQL Server Integration Services and use the built in Fuzzy Lookup Transformation, which supports token & phonetic matching.
So far, we discussed the following points:
Keeping your master data duplicate free is not a one off cleansing sprint but an ongoing governance process—yet the techniques above can deliver a quick win and a cleaner starting point for your next S/4HANA project.
To illustrate these points, we look at duplicate vendors, and we will show how those duplicates can be easily identified using a simple Excel analysis:
Before we start with the identification of the duplicates in excel, first we need to extract the data from the source system.
In our case we will use SAP, and we will extract the data from the LFA1 table for ECC, or BUT000 from S/4HANA systems. Alternatively, use the SQVI transaction if you do not have access to SE16 or want to apply some logic when extracting names.
Now, after extracting the data for the analysis we will proceed with three different approaches.
remQ – Business Inspector for SAP Software offers Business Transaction Monitoring and auditing software with built-in expert know-how.
Organize your vendor list in a single column (e.g., Column A).
Example:
A (Vendor Name)
Microsoft Corp
Microsft Inc
Google LLC
Gooogle Ltd
Amaz0n Inc
Amazon Services
Option 1: Check the First Few Characters
This catches typos at the end of names (e.g., "Microsft" vs. "Microsoft").
Insert a helper column (e.g., Column B) titled "First 5 Chars".
In cell B2, use:
excel
=LEFT(C2, 5)
Drag this formula down to apply to all rows.
Result: Extracts the first 5 characters (e.g., "Micro" from "Microsoft Corp").
=IF(COUNTIF(D:D, D2)>1, "Check - First 5 Chars", "")
Use Excel’s Filter to show only rows flagged as " Check - First 5 Chars ".
Manually review the results (e.g., "Microsft Inc" vs. "Microsoft Corp" or "Gooogle Ltd" vs. "Google LLC").
Example Output
A (Vendor Name)ㅤB (First 5 Chars)ㅤE (Possible Duplicate) Microsoft Corp
Micro Microsft Inc
Micro Check - First 5 Chars
Google LLCㅤGoogl
Gooogle LtdㅤGooogㅤCheck - First 5 Chars
Amaz0n IncㅤAmaz0
Amazon ServicesㅤAmazo
Not truly fuzzy: This method flags partial matches but may miss complex typos (e.g., "Starbucks" vs. "Starbuks").
False positives: Names like "Apple Inc" and "Pineapple Corp" may be flagged due to shared segments.
Case-insensitive: Excel’s COUNTIF ignores case (e.g., "apple" vs. "APPLE").
In this scenario, we will use the built-in Excel function Power Query.
You’ll see a table like this, where vendor names with duplicates are listed alongside their different vendor numbers:
Let’s chat and find the best strategy for yourbusiness! It’s about individual expert advice tailored to your business needs. Tools are only as good as their application. We don’t leave you alone with your solutions, we help you get the most out of them.
Rename the query Vendors.
Duplicate the query (so we can self-merge)
Power Query can only calculate similarity scores during a merge, so we’ll merge the table with itself.
At this stage every row is paired with itself (score = 1) plus any near-duplicates. We need to throw away the trivial self-match.
powerquery
CopyEdit
IsSelf = [VendorName] = [VendorName.1] // adjust names if PQ adds suffixes
(You can skip this step if you lowered thesimilarity threshold below 1; Power Query rarely returns a perfect 1 for typotolerance.)
Power Query returns a compact table:
Any row whose Count > 1 means:
“VendorNo appears at least twice, and those duplicate rows all have the samevendor-name similarity score.”
Hast Du Fragen oder möchtest Du etwas hinzufügen? Hinterlasse uns bitte eine Nachricht! Deine Nachricht wird per E-Mail an uns übermittelt und nicht veröffentlicht.