How to Unicode Function in Excel

Why This Task Matters in Excel

In a world where data travels across borders and languages, spreadsheets rarely contain just the standard Latin alphabet. Product catalogs often include the trademark ™ symbol, finance teams paste fancy quotation marks from PDFs, HR departments manage résumés with accented names, and IT logs may include emojis that record user sentiment. Excel’s UNICODE function gives you the power to interrogate any character and discover its numeric code point in the Unicode standard, which currently covers more than 140,000 characters.

Understanding those numeric values unlocks several practical advantages. Quality-assurance teams can validate that all item numbers stay within an approved character set; localization specialists can confirm that translated text uses the correct “Ω” instead of a visually similar but incorrect symbol; security analysts can identify “homoglyph” attacks in user input where sneaky characters look normal but carry different code points. If you run a mail-merge, you may need to replace unsupported characters before exporting to legacy systems that choke on anything beyond the basic ASCII range. Excel excels (pun intended) at filtering, conditional formatting, and bulk transformation, so adding Unicode awareness keeps those existing tools relevant in a multilingual environment.

Failing to detect hidden or unexpected characters leads to garbled reports, broken CSV exports, and sometimes financial loss. Imagine uploading product SKUs to an ERP that strips invisible zero-width spaces — one wrong character and thousands of orders misroute. Mastering UNICODE therefore connects directly to data integrity, compliance, and customer satisfaction. Additionally, skills gained here interlock with other text functions such as UNICHAR, TEXTJOIN, SUBSTITUTE, and dynamic arrays, building a broader toolkit for advanced text analytics.

Best Excel Approach

The most direct way to discover the numeric code point for any character is Excel’s UNICODE function introduced in Excel 2013. It returns a number that represents the Unicode value of the first character in a text string. This approach is superior to older functions such as CODE because CODE is limited to the first 255 ANSI characters. UNICODE spans the full modern standard, so it works equally well for Japanese kanji, Arabic diacritics, and smiling faces.

Syntax and key parameters:

=UNICODE(text)

text – Required. The text string or single character whose Unicode value you need.
Return value – A decimal integer from 0 to 1,114,111 (the range defined by Unicode).

Use UNICODE whenever you need a precise mapping between characters and their code points for validation, conversion, or reporting. Switch to CODE only if you maintain backward compatibility with pre-Unicode files or if collaborators use extremely old Excel versions. For automation, VBA’s AscW mirrors UNICODE, and Power Query’s Character.ToNumber() offers another route, but in most cases the built-in worksheet function is fastest, requires no macro permission, and behaves correctly in dynamic arrays.

=UNICODE("α")             'Returns 945

Alternative: Combine UNICODE with UNICHAR to round-trip between character and code point.

=UNICHAR(UNICODE("α"))    'Returns back the Greek alpha

Parameters and Inputs

When using UNICODE, you supply one input: the text string. Text can come from a direct literal (e.g., \"ñ\"), from another cell (e.g., A2), or from within a larger string. Only the first character is evaluated, so \"™✓\" yields the code for ™, not ✓.

Typical data types:

Single character (emoji, symbol, language character)
Full strings (only first character processed)
Dynamic array spills (e.g., TEXTSPLIT output)

Ensure the cell is truly text, not formatted numbers or dates; Excel implicitly converts but surprises can occur with unprintable characters. When reading from external systems, consider that leading Byte Order Mark (BOM) characters or zero-width spaces might occupy the first position. TRIM removes spaces less than 32 but does not remove all Unicode whitespace, so CLEAN and SUBSTITUTE may be required.

Edge cases:

Empty string returns #VALUE!
Surrogate pairs are handled automatically; UNICODE correctly returns a single code point above 65535 for emojis.
Non-printing characters still produce a number; you must decide how to handle them downstream.

Step-by-Step Examples

Example 1: Basic Scenario – Identify the Unicode Code of a Currency Symbol

Suppose you copy-paste international price lists where the currency symbol appears before the amount. You need to verify which symbol is being used so you can map it to ISO currency codes.

Sample data layout:
Cell B3 contains \"¥12,500\" and cell B4 contains \"₩45000\".

Step-by-step:

In C3 enter

=UNICODE(LEFT(B3,1))

The LEFT function extracts the first character, which is the currency symbol, and UNICODE returns 165.
2. Copy the formula to C4. It returns 8361.
3. Use a lookup table that maps 165 to JPY and 8361 to KRW, then join it to your dataset with XLOOKUP.

Why this works: UNICODE isolates the machine-readable numeric ID for each symbol, eliminating ambiguity when similar glyphs exist (for example, full-width yen versus half-width yen).

Common variations: Some data might have spaces before the symbol. Wrap TRIM around B3 or use SUBSTITUTE to strip non-breaking spaces.

Troubleshooting tips: If the formula returns an unexpected value, check whether LEFT captured a hidden character such as the BOM or zero-width no-break space. Use LEN to confirm string length and CODE to see if the issue is limited to the basic ANSI range.

Example 2: Real-World Application – Validate Product IDs Against Allowed Character Set

A manufacturing firm maintains part numbers that must only use uppercase Latin letters, digits, and underscores. Any other character causes downstream CNC machines to error. You have 10,000 rows in column A.

Data setup (first three rows): [A1] = \"PLATE_10\"
[A2] = \"GÄUGE_5\"
[A3] = \"ROD-12\"

Goal: highlight rows with illegal characters.

Steps:

In B1 enter a helper formula that scans the entire string for the first illegal character.

=LET(
    txt, A1,
    illegal, TEXTJOIN("",,FILTER(TEXTSPLIT(txt,""),UNICODE(TEXTSPLIT(txt,""))>90)),
    illegal
  )

Explanation: TEXTSPLIT breaks the string into individual characters. UNICODE obtains code points; anything greater than 90 lies outside uppercase letters and digits. TEXTJOIN stitches the illegal characters together into an alert string. 2. Fill the formula down. The dynamic array spills automatically, but each LET remains local to its row when you press Ctrl Enter. 3. Apply conditional formatting: Formula =B1<>\"\", set fill to red. 4. Review output: Row 2 flags \"Ä\" (code 196), row 3 flags \"-\" (code 45).

Business impact: The validator prevents costly machine downtime by catching errors before NC files export. You can extend the rule by adding logical OR conditions so the filter also blocks characters whose code is less than 48 or between 58 and 64 (punctuation).

Integration: Combine results with Data > Filter to hide valid rows and focus on problem parts. For recurring checks, place the formula inside a Table so it auto-expands.

Performance: On 10,000 rows the LET-powered dynamic approach is fast because UNICODE is lightweight. Avoid volatile functions to keep recalculation times low.

Example 3: Advanced Technique – Detect and Replace Homoglyph Attacks in Usernames

Cyber-security teams worry about “homoglyph” attacks where attackers register deceptive usernames such as “paypaI” (with capital i) instead of “paypal” (with lowercase L). While both look similar, their Unicode codes differ, and UNICODE helps build an automated defense.

Scenario: Sheet [LoginAttempts] column D contains 50,000 usernames. You have a whitelist in column H of legitimate names. You need to normalize suspicious characters or flag them.

Step-by-step:

Create a mapping table that lists problematic characters and their safe replacements. For example:
[J2:J5] = \"I\", \"О\", \"о\", \"İ\"
[K2:K5] = \"l\", \"O\", \"o\", \"i\"
Note: The capital “О” here is Cyrillic, code 1054, and differs from Latin O (code 79).
Build a normalization function via a named formula NormUser:

=LAMBDA(txt,
    REDUCE(txt,SEQUENCE(ROWS([J2:J5])),
        LAMBDA(acc,i,
           SUBSTITUTE(acc,INDEX([J2:J5],i),INDEX([K2:K5],i))
        )
    )
)

In E2 enter

=NormUser(D2)

Fill down. This replaces any homoglyph characters with their safe counterpart. 4. Match the normalized username against the whitelist:

=IF(COUNTIF([H:H],E2)>0,"Allowed","Review")

For investigative logging, also output the Unicode codes that changed:

=TEXTJOIN(",",,FILTER(
     UNICODE(TEXTSPLIT(D2,"")),
     TEXTSPLIT(D2,"")<>TEXTSPLIT(E2,"")
  ))

This shows which characters were altered, providing an audit trail.

Professional tips:

Build the mapping list dynamically by capturing the incoming character set.
Use Power Query’s fuzzy match for a second layer.
If you must process millions of rows, push the logic to Power BI or SQL where collation settings support Unicode natively.

Edge case handling: Some usernames may combine accents with base characters (decomposed forms). Use NORMALIZE() from Office Scripts or a VBA routine to convert to composed or decomposed form before comparing.

Tips and Best Practices

Combine UNICODE with UNICHAR for easy round-trips, e.g., UNICHAR(UNICODE(A1)) checks whether A1 contains a single character.
Use TEXTSPLIT to explode multi-character strings and map UNICODE across dynamic arrays without helper columns.
LEverage LET to store intermediate calculations like lists of codes, reducing repeated calls and improving readability.
When validating printable ranges, sort code points numerically so conditional rules remain easy to maintain.
Convert code points to hexadecimal with DEC2HEX(UNICODE(char)) when interfacing with HTML entity codes or JSON escapes.
Document your character mappings. Future collaborators will avoid guessing whether code 201 is É or a mistake.

Common Mistakes to Avoid

Assuming CODE and UNICODE are equal: CODE truncates at 255; using it on emojis returns question marks. Always choose UNICODE for modern data.
Forgetting that UNICODE reads only the first character: Calling UNICODE(\"€100\") returns 8364 (Euro) — if you expected to evaluate the entire string, wrap TEXTSPLIT or iterate with MAP.
Overlooking hidden whitespace: Zero-width spaces pass visual inspection. Use UNICODE to reveal code 8203. Apply SUBSTITUTE to remove them.
Copy-pasting formulas without anchoring ranges: Dynamic arrays referencing entire columns can slow recalculation. Use structured tables or limit ranges.
Relying on font appearance: Different fonts display glyphs differently. Always base validation on code points, not what looks “wrong” on screen.

Alternative Methods

Below is a comparison of techniques for obtaining or utilizing Unicode values in Excel.

Method	Scope	Pros	Cons	Recommended Use
UNICODE	Worksheet function	Native, fast, dynamic-array aware	Only first character	Day-to-day validation and reporting
CODE	Worksheet function	Backward compatible with old Excel	Limited to 255	Legacy ANSI spreadsheets
VBA AscW	Macro	Full string access per character, loops	Requires macro security	Complex preprocessing or automation
Power Query Character.ToNumber	ETL pipeline	Scales to millions of rows	Requires refresh step, no live calc	Data imports and cleansing
Custom Office Script	JavaScript	Full Unicode API	Requires scripting knowledge	Cross-platform automation in Excel Online

When working in pure spreadsheets, UNICODE remains the most straightforward. Switch to Power Query when datasets exceed hundreds of thousands of rows or must join with databases. VBA still has value in established macro libraries or when advanced looping logic is required.

FAQ

When should I use this approach?

Deploy UNICODE when you need to validate characters, map symbols to numeric codes, detect non-ASCII input, or troubleshoot encoding issues. It is perfect for compliance checks, data cleansing, and localization projects.

Can this work across multiple sheets?

Yes. Reference another sheet just as you would with any function:

=UNICODE(Sheet2!A1)

Dynamic arrays spill across sheet boundaries only when you explicitly reference them; otherwise, each sheet calculates independently.

What are the limitations?

UNICODE evaluates only the first character and returns an error on empty strings. It cannot normalize decomposed characters automatically and does not process complete strings in one call, so you must wrap TEXTSPLIT or MAP for multi-character analysis.

How do I handle errors?

Wrap the call in IFERROR to provide fallback values:

=IFERROR(UNICODE(A2),0)

If you encounter unexpected values, check for hidden characters with CODE, CLEAN, or by viewing the cell in a hex editor.

Does this work in older Excel versions?

UNICODE and UNICHAR are available from Excel 2013 onward (Windows) and Excel 2016 (Mac). For Excel 2010 or earlier, use VBA AscW or enable Power Query if supported.

What about performance with large datasets?

UNICODE is lightweight. Even on 100,000 rows it recalculates nearly instantly. Bottlenecks usually arise from volatile functions or complex array formulas built around UNICODE. Push heavy transformations to Power Query or SQL when processing millions of rows.

Conclusion

Mastering the UNICODE function equips you to tackle the messy reality of modern text data—whether you are policing illegal characters, safeguarding against homoglyph attacks, or simply converting symbols to their numeric equivalents for database storage. This skill dovetails with dynamic arrays, TEXT functions, and data validation, forming a crucial part of any power user’s arsenal. Experiment with the examples provided, adapt them to your data, and explore related functions like UNICHAR and DEC2HEX. Soon you will be effortlessly taming multilingual datasets and ensuring bullet-proof data integrity across your entire Excel workflow.

How to Unicode Function in Excel

How to Unicode Function in Excel

Why This Task Matters in Excel

Best Excel Approach

Parameters and Inputs

Step-by-Step Examples

Example 1: Basic Scenario – Identify the Unicode Code of a Currency Symbol

Example 2: Real-World Application – Validate Product IDs Against Allowed Character Set

Example 3: Advanced Technique – Detect and Replace Homoglyph Attacks in Usernames

Tips and Best Practices

Common Mistakes to Avoid

Alternative Methods

FAQ

When should I use this approach?

Can this work across multiple sheets?

What are the limitations?

How do I handle errors?

Does this work in older Excel versions?

What about performance with large datasets?

Conclusion

How to Show the 10 Most Common Text Values in Excel

How to Abbreviate Names Or Words in Excel

How to Abbreviate State Names in Excel

How to Unicode Function in Excel

Why This Task Matters in Excel

Best Excel Approach

Parameters and Inputs

Step-by-Step Examples

Example 1: Basic Scenario – Identify the Unicode Code of a Currency Symbol

Example 2: Real-World Application – Validate Product IDs Against Allowed Character Set

Example 3: Advanced Technique – Detect and Replace Homoglyph Attacks in Usernames

Tips and Best Practices

Common Mistakes to Avoid

Alternative Methods

FAQ

When should I use this approach?

Can this work across multiple sheets?

What are the limitations?

How do I handle errors?

Does this work in older Excel versions?

What about performance with large datasets?

Conclusion

Related Articles

How to Show the 10 Most Common Text Values in Excel

How to Abbreviate Names Or Words in Excel

How to Abbreviate State Names in Excel