header background image

Do-It-Yourself Check for Duplicate Business Partner

May 14, 2025

von

#

SAPCompliance

#

SAPsecurity

1  Introduction: why duplicate free master data matters

Clean master data is the bedrock of every SAP powered process. When the same customer, supplier or product is stored two (or two hundred) times under slightly different keys, it ripples through finance, logistics, analytics and compliance. Companies report that de duplication projects routinely uncover 5 – 15 % redundant master records, affecting everything from on time delivery KPIs to credit exposure and ESG reporting. SAP therefore ships “duplicate check” services in S/4HANA and SAP Business ByDesign, yet many organisations still carry legacy duplicates created years before those safeguards were activated.

2  Which SAP master data objects are prone to duplicates?

Below are the objects most frequently hit in audits and brown field S/4HANA migrations:

3  Why duplicates are a real problem

Here is a summary and examples of the problems you might encounter when you have duplicates in your master data:

4  Principles of detecting duplicates in business partner data

Detection strategies depend on which attributes you can safely compare:

Key takeaway: use the most discriminative field available, then enrich missing pieces (e.g. call company registers or address validation APIs) before applying fuzzy logic.

5  Name matching techniques

  1. Exact match – fastest, but fails on “Ltd.” vs “Limited”.
  2. Standardisation rules – strip legal affixes (Inc., GmbH, S.A.), punctuation, case folding.
  3. Phonetic algorithms – Soundex, Cologne phonetics (good for DE names), Metaphone; catches “Schmidt” vs “Schmid”.
  4. Edit distance (Levenshtein / Damerau) – counts insertions, deletions, transpositions; tunable threshold.
  5. Token based/fuzzy joins – Jaccard, TF IDF cosine similarity on word tokens; robust to word order.
  6. AI/ML embeddings – transformer models fine tuned on company name pairs; superior for cross language transliterations.

6  Why “first N characters + fuzzy” helps in B2B lists

Legal affixes balloon edit distance. “ACME Shipping Inc.” (length = 17) vs “ACME Shipping LLC” differs in the last three letters but Levenshtein = 3 (> 2) ⇒ missed under a tight threshold. Solution:

  1. Cut the name to the first N significant characters (6 10 usually hit the core brand).
  2. Normalise: uppercase, strip spaces/punctuation.
  3. Run a fuzzy distance ≤ 2 on that substring.
    o “ACMESHI” vs “ACMESHI” → 0 (match)
    o “SIEMEN” vs “SIEMEN” → 0 despite “Siemens AG” vs “Siemens Ltd.”

This yields high recall without blowing up the candidate set.

7  Hands on: extracting names from SAP & matching in Excel

Tip: if you need more control (e.g. ISO alpha transliteration, country specific rules), export to SQL Server Integration Services and use the built in Fuzzy Lookup Transformation, which supports token & phonetic matching.

8  Summary

So far, we discussed the following points:

  • Duplicate master records inflate costs, distort analytics and create compliance landmines.
  • Business Partner, Material, Vendor, Customer and several other SAP objects are especially vulnerable.
  • Start with unique identifiers (VAT, IBAN, DUNS). Where they’re missing, combine substr based preprocessing with fuzzy algorithms.
  • In B2B data a simple “first 8 characters + Levenshtein” heuristic zaps many Inc./Ltd. variants without raising the threshold.
  • SAP gives you raw access via SE16 and SQVI; Microsoft’s free Fuzzy Lookup plug in lets business users prototype matching logic in minutes before hardening the rules in SAP MDG, Data Services or third party MDM tools.

Keeping your master data duplicate free is not a one off cleansing sprint but an ongoing governance process—yet the techniques above can deliver a quick win and a cleaner starting point for your next S/4HANA project.

9  Do-It-Yourself

To illustrate these points, we look at duplicate vendors, and we will show how those duplicates can be easily identified using a simple Excel analysis:

Before we start with the identification of the duplicates in excel, first we need to extract the data from the source system.

In our case we will use SAP, and we will extract the data from the LFA1 table for ECC, or BUT000 from S/4HANA systems. Alternatively, use the SQVI transaction if you do not have access to SE16 or want to apply some logic when extracting names.

Now, after extracting the data for the analysis we will proceed with three different approaches.

BROSCHÜRE - die Vorteile unserer Produkte!

remQ - Business Inspector for SAP® Software

remQ – Business Inspector for SAP Software offers Business Transaction Monitoring and auditing software with built-in expert know-how.

Tablet mit dem Deckblatt des Dokuments

Scenario 1: Identify duplicates by comparing the first 5 characters of the vendor's name

Step 1: Prepare Your Data

Organize your vendor list in a single column (e.g., Column A).

Example:

A (Vendor Name)

Microsoft Corp

Microsft Inc

Google LLC

Gooogle Ltd

Amaz0n Inc

Amazon Services

Step 2: Add Helper Columns for Partial Matching

Option 1: Check the First Few Characters

This catches typos at the end of names (e.g., "Microsft" vs. "Microsoft").

Insert a helper column (e.g., Column B) titled "First 5 Chars".

In cell B2, use:

excel

=LEFT(C2, 5)

Drag this formula down to apply to all rows.

Result: Extracts the first 5 characters (e.g., "Micro" from "Microsoft Corp").

Step 3: Flag Potential Duplicates

  1. Insert a final helper column (e.g., Column G) titled "Possible Duplicate".
  2. In cell G2, use a formula to check for partial matches across your helper columns. Example 1: Check if the first 5 characters appear elsewhere in the list:

=IF(COUNTIF(D:D, D2)>1, "Check - First 5 Chars", "")

Step 4: Filter and Review

Use Excel’s Filter to show only rows flagged as " Check - First 5 Chars ".

Manually review the results (e.g., "Microsft Inc" vs. "Microsoft Corp" or "Gooogle Ltd" vs. "Google LLC").

Example Output

A (Vendor Name)ㅤB (First 5 Chars)ㅤE (Possible Duplicate) Microsoft Corp

Micro Microsft Inc

Micro Check - First 5 Chars

Google LLCㅤGoogl

Gooogle LtdㅤGooogㅤCheck - First 5 Chars

Amaz0n IncㅤAmaz0

Amazon ServicesㅤAmazo

Limitations

Not truly fuzzy: This method flags partial matches but may miss complex typos (e.g., "Starbucks" vs. "Starbuks").

False positives: Names like "Apple Inc" and "Pineapple Corp" may be flagged due to shared segments.

Case-insensitive: Excel’s COUNTIF ignores case (e.g., "apple" vs. "APPLE").

Scenario 2: Identify duplicates by comparing the full vendor's name

In this scenario, we will use the built-in Excel function Power Query.

Step 1: Load Your Data into Power Query

  1. Open Power Query:
    • Go to the Data tab → Click From Table/Range.
    • This opens the Power Query Editor.

Step 2: Group Vendor Names and Aggregate Vendor Number

  1. Group by Vendor Name:
    • In Power Query, select the Vendor Name column.
    • Go to Transform → Group By.
    • Configure the grouping:
      • Group By: Vendor Name
      • New Column Name: Count → Operation: Count Rows
      • Click Add Aggregation → New Column Name: AllRows → Operation: All Rows.
    • Click OK.

  1. Filter groups with duplicates:
    • Select the Count column → Click the filter icon → Choose Number Filters → Greater Than... → Enter 1.
    • This keeps only vendor names that appear more than once.

Step 3: Expand the Vendor Numbers

  1. Expand the "AllRows" column:
    • Click the expand icon (↗) in the AllRows column header.
    • Check Vendor Number → Uncheck "Use original column name as prefix" → Click OK.

  1. Clean up the output
    • Remove the Count and AllRows columns (right-click → Remove).
    • Your final output will show vendor names with their associated vendor numbers in separate rows.

Step 4: Load the Results to Excel

  1. Export the results
  • Go to Home → Close & Load → Close & Load To....

  • Choose Existing Worksheet and select a cell      for the output → Click OK.

Final Output

You’ll see a table like this, where vendor names with duplicates are listed alongside their different vendor numbers:

TALK TO US – book a free meeting

WIR SIND FÜR DICH DA!

Let’s chat and find the best strategy for yourbusiness! It’s about individual expert advice tailored to your business needs. Tools are only as good as their application. We don’t leave you alone with your solutions, we help you get the most out of them.

Tablet mit dem Deckblatt des Dokuments
Keine Artikel gefunden.

Scenario 3: Identify duplicates by fuzzy search on the vendor's name

Step 1: Load Data into Power Query

  1. Open the file and convert the range to an Excel Table (Ctrl + T). Assume its headers are VendorNo and VendorName.
  2. Give the table a meaningful name, e.g. tblVendors.
  3. With any cell inside tblVendors selected, go to Data ➜ Get & Transform ➜ From Table/Range.
    Power Query opens.
  4. Check the Data Type icons: make VendorNo Text (even if it’s numeric) and VendorName Text.

Rename the query Vendors.

  1. Open Power Query:
    • Go to Data → Get Data → From Table/Range.

Step 2: Merge the Table with Itself (Fuzzy Join)

Duplicate the query (so we can self-merge)

Power Query can only calculate similarity scores during a merge, so we’ll merge the table with itself.

  • In the Queries pane → right-click VendorsDuplicate → rename the copy to Vendors_Copy. (Do not change anything inside it.)

Step 3: Create a self-merge that returns similarity scores

  1. Still in Power Query, select Home ➜ Merge Queries ➜ Merge Queries as New.
        First table = Vendors, Second table = Vendors_Copy.
  2. Choose columns in this order (hold Ctrl):
        First VendorNo (exact match) → then VendorName (fuzzy match).
        Power Query will treat both columns as the composite key.
  3. Join kind = Inner (keeps only rows that have at least one duplicate).
  4. Tick Use fuzzy matching to perform the merge.
  5. Expand “Fuzzy matching options” and set:
       
    • Similarity threshold = 0.80 (or your own cut-off).
    •  
    • Ignore case = ✓ (recommended).
    •  
    • Show similarity scores = ✓ (this gives us a numeric column).
    •  
    • Number of matches = 5 (safe default).
           Leave the other boxes alone.​ Microsoft Learn
  6.  
  7. Click OK → a new merged query (call it Duplicates_Raw) appears.

Step 4: Expand and isolate the useful columns

  1. In Duplicates_Raw, you’ll see a single column named something like Vendors_Copy that contains Table values.
  2. Click the expand icon (↘) → tick VendorName and Similarity score → OK. Now each original row has its “sibling” vendor name and the similarity score to that sibling.

At this stage every row is paired with itself (score = 1) plus any near-duplicates. We need to throw away the trivial self-match.

Step 5: Remove the self-match (Similarity = 1 and identical row)

  1. Add a Custom Column

powerquery

CopyEdit

IsSelf = [VendorName] = [VendorName.1]   // adjust names if PQ adds suffixes

  1. Filter IsSelf = FALSE → keeps only true duplicates.

(You can skip this step if you lowered thesimilarity threshold below 1; Power Query rarely returns a perfect 1 for typotolerance.)

Step 6: Find duplicate VendorNos that share the same score

  1. Select VendorNo and Similarity score columns.
  2. Go to Group By
    • Group by: VendorNo, Similarity score
    • New column name: Rows
    • Operation: Count Rows

Power Query returns a compact table:

VendorNo Similarity Score Count
123456 0.92 2
789012 0.87 3

Any row whose Count > 1 means:
“VendorNo appears at least twice, and those duplicate rows all have the samevendor-name similarity score.”

  1. (Optional) Filter Count ≥ 2 to see only the offenders.

Step 7: Return the results to Excel

  1. Click Home ➜ Close & Load ➜ Close & Load To… → Table → (new worksheet).
  2. You now have a live report that updates whenever the source file changes (Refresh All).

ÜBER DEN AUTOR

Jens Kettler

Jens verfügt über mehr als 20 Jahre Erfahrung in den Bereichen SAP-Sicherheit, Compliance und interne Kontrollen. Er ist ein ehemaliger Wirtschaftsprüfer, immer neugierig, bereit zu lernen und Wissen zu teilen. Bei VOQUZ Labs ist Jens für die Risiko- und Compliance-Produkte verantwortlich. Es macht ihm Spaß, mit Kunden zu interagieren und schnelle und einfache Wege zu finden, um Produkte zu verbessern und den Kunden einen Mehrwert zu bieten. Pragmatisch und kundenorientiert? Dann Jens :)

SENDE UNS EINE NACHRICHT

Hast Du Fragen oder möchtest Du etwas hinzufügen? Hinterlasse  uns bitte eine Nachricht! Deine Nachricht wird per E-Mail an uns übermittelt und nicht veröffentlicht.

Danke! Deine Anfrage wurde empfangen!
Ups! Beim Absenden des Formulars ist etwas schief gelaufen.
Illustration of a woman editing documents

Melde Dich für unseren Newsletter an!
Bleib auf dem Laufenden!

Thank you! Your successfully signed up for our newsletter.
Ups! Beim Absenden des Formulars ist etwas schief gelaufen.

WEITERE RELEVANTE ARTIKEL

Vorschaubild mit Link zum Beitrag unten

Quick Do-It-Yourself Sanctions List Screening in SAP

24.3.2025

|

SAPCompliance

Vorschaubild mit Link zum Beitrag unten

Identifying Discrepancies between Payment Runs and Vendor Master Data in SAP

17.1.2025

|

SAPCompliance

Vorschaubild mit Link zum Beitrag unten

Order-to-Cash Processes: Ensuring Compliance and Preventing Fraud with Order-to-Cash and Revenue Assurance

18.12.2024

|

SAPCompliance