With Delphi “Tiburon” CodeGear wants to introduce Unicode-support for the Win32 personality. Unlike C++ Builder which has a preprocessor and thus can at least switch between ANSI and Unicode via a simple preprocessor define, Delphi didn’t have such a (globally available) mechanism in previous versions. And even in C++ Builder you wouldn’t get far with the preprocessor defines, since the (ANSI-only) VCL is merely Delphi code compiled and linked into your C++ Builder projects. Several free and commercial alternatives like the well-renowned “TNT Unicode Controls” have been created throughout the years, to keep developers happy with some support for Unicode. In fact some of those controls offered a support that reached beyond what a conditional build in other development environments would yield: the ability to run on Windows 9x and the NT-platform seamlessly with one binary.
To recount known facts, Unicode exists quite a while already and NT4 (released in 1996, i.e. about 12 years ago) supported UCS-2 (two-byte wide characters) which has now been superseded by UTF-16. Now let’s get to the Delphi-specific part and put aside the fact that CodeGear (under its various names) was apparently not following up on the developments in the Windows development community very well.
The migration of existing Delphi-code is going to be a PITA for many many projects, since CodeGear took the strange - and IMO wrong - decision of defaulting the aptly named PChar to PWideChar starting with Delphi “Tiburon” - read here. Quote from this very link:
Now that PChar is an alias to PWideChar, things started falling over because the element type is now 2 bytes.
Here is how PCHAR is declared in the Windows SDK inside WinNT.h:
typedef CHAR *PCHAR, *LPCH, *PCH;
For those not literate enough in C, here is what this means in Delphi:
type PCHAR = ^CHAR; type LPCH = ^CHAR; type PCH = ^CHAR;
Since the Delphi language (and Pascal) is case-insensitive, you now have the definition of Delphi’s PChar as everyone else in the Windows development community understands it.
Do we have a problem here? We sure do. While in the C/C++ world - and remember that this also includes C++ Builder - TCHAR is the basis for any pointer types that can be used for zero-terminated character strings, PCHAR was clearly meant to be ANSI only. With the preprocessor symbols _UNICODE and UNICODE, any Windows C/C++ preprocessor that can be used with the Windows SDK will happily replace every occurrence of TCHAR with WCHAR; … which would be the counterpart to WideChar in Delphi and is also defined in Windows.pas, if memory serves me well. On the other hand if those preprocessor symbols are not defined, TCHAR will resolve to CHAR which resolves to char, the intrinsic type used for one-byte characters - read: ASCII/ANSI. So consequently, the widely known naming scheme has been broken by CodeGear in order to break legacy code only partially.
Allen predicted it right in his blog entry (already linked above):
I predict there will be hails of praise from one camp, and sneering and guffaws from another. The largest camp will be the ones ambivalent to this change. To which camp to you belong?
Guess what camp I belong to? Now don’t get me wrong. I am all for breaking backwards compatibility if it serves the purpose. And this certainly would serve the purpose … the purpose of lifting Delphi on par with other Windows development environments again.
However, breaking it half-heartedly is not the way to go, because the rather relaxed type checking in Delphi, compared to C++ (although some claim the opposite) will cause subtle bugs in existing code-bases. Furthermore they decided to change the meaning of a type name which is known and used beyond Delphi and thus breaking widely accepted naming conventions.
The decision whether this is an intentional affront against a “Microsoft convention” or simply a misjudgement is left to the reader’s subjective view. I think it is a wrong decision in any case, because you should play by the rules of the environment for which you create your products. In case of Delphi that’s still Windows.
By the way: JWA and JWSCL can be independently built with UNICODE defined or not defined. So it’s prepared and the type name (TJwPChar) follows the CodeGear naming scheme.
// Oliver
PS: Before anyone blames any of the JEDI projects (hosted here or elsewhere) for this post, these are my thoughts. Other members do not necessarily agree with anything or everything said here.
23 Responses
CR
10|May|2008 1While I can see where you’re coming from, I still think your complaint is misplaced, to the extent that Delphi’s Char and PChar have always only contingently mapped to the C types of similar names - as in, ‘Char’ has always formally been a generic type (like ‘Integer’ or ‘Cardinal’ or ’string’), with ‘PChar’ being so named because of the convention of prefixing pointer types with the letter ‘P’.
Kaitnieks
10|May|2008 2I belong to your camp (for different reasons), however I’m not sure if it’s not too late already. As far as I understand it, they have already implemented most of this.
Oliver
10|May|2008 3Indeed, in Delphi Char has been a generic type just like char is an intrinsic type of C/C++. The assertion about the leading P is a good point, but given how prevalent PChar is in Delphi code that uses the Win32 API, this decision makes much of the existing code look bad. Again, not per-se a problem, but I think this way is wrong.
Using your own types will make migration easier, because you’ll be able to base them on a different type now, but many people used types such as String and PChar literally and used them for what they were: ANSI strings.
Had CodeGear (Borland) emphasized the fact that String and Char and all the derived types are supposed to be independent of character width, this wouldn’t be a problem at all. But they haven’t. So not only are they late to introduce the Unicode support, but they have mislead the Delphi community to believe that Char is AnsiChar.
@CR: So “where” do you think I come from?
… hint, it’s not C/C++ …
Christian Wimmer
10|May|2008 4I suppose the people from Borland had the thought in mind that most programmers do not care about the type size. Most of these guys use the WinAPI functions without A or W prefix. However the new Delphi will - by default - use the W prefix and thus this would break existing code that uses unicode WinAPI functions with simple PChar.
Of course there are always people out there who depend on the size of a type for various reasons. I remember a post in (german) Delphi Praxis where a posting race emerged because someone heard that the type Integer is going to be 64bit in the Delphi 64bit release.
Anyway. JWAPI and JWSCL are prepared for the unicode release. Indeed they are already fully Ansi- and Unicode aware. Furthermore the support for ansicode continues in future.
Oliver
10|May|2008 5You’re actually right here, Chris - and this also supports CR’s assertion. I checked in the BCB6 code and help (since this was the oldest Borland product I have access to in installed form at the moment):
Example from winsvc.pas:
{$EXTERNALSYM OpenSCManager} function OpenSCManager(lpMachineName, lpDatabaseName: PChar; dwDesiredAccess: DWORD): SC_HANDLE; stdcall; {$EXTERNALSYM OpenSCManagerA} function OpenSCManagerA(lpMachineName, lpDatabaseName: PAnsiChar; dwDesiredAccess: DWORD): SC_HANDLE; stdcall; {$EXTERNALSYM OpenSCManagerW} function OpenSCManagerW(lpMachineName, lpDatabaseName: PWideChar; dwDesiredAccess: DWORD): SC_HANDLE; stdcall;Rudy Velthuis
10|May|2008 6I have done quite a few conversions, but I have not encountered a PCHAR type being used in the original headers yet. Most of the time, LPTSTR or LPCTSTR is being used, and I simply leave it at that (i.e. they should not be translated to PChar or PWideChar at all).
The official Borland/CodeGear Delphi translations of headers have always used PAnsiChar, PWideChar and PChar for the generic call (i.e. the function name without W or A suffix). Since the generic calls will probably be mapped to the wide calls (and I assume their wpar.exe tool will be modified accordingly), I don’t see any problem at all. You will still have the size specific -A and -W versions, and the generic version still uses PChar (but will now map to the wide -W version).
Oliver
10|May|2008 7Rudy, I admire your commitment to the Delphi community, but I beg to differ
If you have touched the LSA headers you’ll have come accross LSA_UNICODE_STRING and LSA_STRING, where the latter one uses PCHAR clearly in the ANSI sense. LDAP functions also use it excessively. But those with which I personally have to do the most are ImageHlp.h and DbgHelp.h.
The CM API also uses them for the “A” function, while using PWCHAR for the “W” function in the original. The JWAPI headers use PAnsiChar explicitly, as is correct. But it remains as an inconsistency …
Xepol
11|May|2008 8Since when the heck did Delphi, a Pascal language have to adhere to C/C++ conventions?
PChar does not, in fact, come from C/C++ conventions at all, but rather an OLD convention in Pascal, where by a pointer to a type is prefixed with a P, and if that Type starts with a T, the T is removed. Thus:
PString = ^String;
PRecord = ^TRecord;
TRecord = Record a,b,c : Integer End;
Following that then :
PChar = ^Char;
Is a natural construct for the Pascal langauge.
If you assume that CHAR is now a unicode char, and not an ansi char, then it follows that PChar points to wide strings. The fact is that you can use AnsiChar and PAnsiChar even in current versions to clearly point at ansi string data, which Delphi just happens to keep null termianted for convenience when working with C/C++ based libraries.
Once you understand that PChar never came from C/C++ in the Delphi/Pascal world, but rather is just a darn convenience conceptual overlap, I think you can agree that it would be absurd to expect Delphi to suddenly adhear to C/C++ standards just because of a useful coincidence.
That said, from everything I’ve read on Allen’s blog, the change they are introducing will make forward migrations less painful and more transparent than you suggest, and ONLY cause problems when you are working in custom areas. The concept being that if you are smart enough to play in those areas, you are smart enough to work around the issues involved as they come up.
Jaakko
11|May|2008 9I think that Olivier misses the point here. C/C++ can be used to create and even compile tha same source to both Ansi and Unicode. Codegear approach in Delphi 2008 is much better: it only creates Unicode applications. This is why it is abosutely right choice from Codegear to make UnicodeString to default string.
This will beak lot of code and you have to go thouht your code but this is needed. unicode has been ther for years ut there has been no good way to create Unicode applications in Delphi.
Once you have checked you code you application will be true Unicode application that can handle any internetional strings, can be localized to all language (including Hindi that was not possible in Delphi 2007) and the localized application can be run on any computer even if the OS language and system locale do not match the localized language.
Oliver
11|May|2008 10Who talked about C/C++ conventions? I am talking about Windows development conventions (more clearly: SDK). And although I clearly erred on the “why” of
PChar, the issue remains in fact the same. The difference is, that the problem goes back a lot further than I first thought. A solution isn’t in sight, though.Your are right here, and I totally agree, but once we replace “C/C++ standards” with “Windows conventions” - because by no means is
PCHARa C++ intrinsic type - things change.PCHARis a type from the Windows SDK where it is solely used as an ANSI character string. In fact it is used in at least one (SDK) macro as a means to compute offsets into a record, which would be converted to a function in Delphi. Now the fact it is a function in Delphi is no big deal, but the fact that this “overlap” causes the presumption of “PCHARpoints to one-byte items” to be void is. In this case it can cause some subtle bugs. And this is what I meant. Heck, even the Indy 9 usesPCHAR(even in uppercase) to represent an ANSI character string. But those things are usually easy to find. The hard to find things are the parts where pointer arithmetics have been used as pointed out by Allen in his post.And those are the parts that, if they break, will cause most trouble. Not that the problem itself will necessarily be huge or even always visible … but those are the bugs hardest to find. Whoever has hunted down a few heap corruptions knows very well what I mean, because the source of the problem and the location where it shows are most of the time entirely unrelated. Even more so when multiple threads are involved.
I guess what I am trying to say is, that this change CodeGear is planning (although the schedule suggests it’s rather a final decision anyway) is huge. It is so huge, that all code that ever made any assumptions about PChar should break.
Look, if CodeGear came along and changed the alignment of records, would that also be considered a minor issue? Heck, no! Suddenly all your logic will break, because - as an example - you are reading your data from a memory mapped file and the structure in the “legacy” files differs just because you were rebuilding from source. This is horrible.
Oliver
11|May|2008 11Quoting myself:
… what I meant to say was:
And those parts, if they break the logic, but compile silently, will cause most trouble.
Oliver
11|May|2008 12@Jaakko: actually you are missing my point.
Did you read the whole post?:
Breaking the code would be good. CodeGear, however, is trying not to break your code, so it will be a lot harder to actually find the subtle issues.
Xepol
11|May|2008 13“Your are right here, and I totally agree, but once we replace “C/C++ standards” with “Windows conventions” ”
Following that concept, the string type itself is a breaking of the windows convention.
Why is it a windows convention? Because it is written in C. Is it really a windows convention then or just a bit of language detail bleeding out the edges of the win api? ( Oh, and PChar was available in dos and DPMI targets for BP 7.0, and anyone could create their own PChar before that, and I’m sure due to the pascal typing style at the time, many did )
This type of change isn’t really new in the Delphi world. The way strings changed in Delphi 2.0 was pretty significant (I had to clean up a LOT of string[0] code for example, but was better off for the pain), the way memory access changed HUGELY. Integer has been a pain as it grew.
And yet, in the long run, the product has been better for it.
Explain to me again, why you think a pascal lanuage has to adhere to non pascal language standards?
Oliver
11|May|2008 14Maybe you want to read my post and my replies to the comments again?! There is neither a C nor a C++ (standard) type of the name
PCHARit’s a type declared in the Windows SDK. The Windows SDK, formerly Platform SDK (PSDK) is the reference for the Windows platform, whether you like it or not. And whether you like it or not, it was the template for the Delphi units declaring the Win32 functions. I understand that if you never used anything beyond Delphi, the whole thing becomes a mashup, because you get the “contents of the SDK” (at the time of product release) in the Delphi installation directory. It doesn’t change the fact, that the SDK is the reference, though and that Delphi as a Windows development tool should stick to its conventions as far as possible.Maybe it is a coincidence with the overlap of the name, maybe not. Who knows? Is it important? Nope. What’s important are the problems arising from it. And I’ve gone into enough detail about those.
BTW: I did not even claim no one has ever used
PCHARoutside of the Windows context and so on. That was you putting words in my mouth.Sorry, don’t like to repeat myself over and over again
Look, I am completely indifferent about the tools I use for development, I use the tool the suits my needs best, although people here seem to think just because I point out a shortcoming of (current and former) Delphi versions and because I am opposed to the way this transition to the Unicode-ability is handled, I am an enemy of Delphi. Whatever, keep thinking it. If it wasn’t for critics, how would those products ever evolve, if everyone says: “Oh, product XYZ is sooo nice, it’s perfect, I love it … unconditionally”.
Jaakko
12|May|2008 15I understand your point but keeping PChar to PAnsiChar won’t generate many problems. Let me give a sample. If PChar would still be PAnsiChar it would break huge amount of code that calls to WIN32 API. Now if you pass a string to API you normally cast it to PChar
GetProcAddress(FHandle, PChar(name));
Now name is string and ultimately AnsiString. When using Unicode name is also string but WideString. I guess when using Unicode the W version WIN32 API is used so it requires PWideChar instead of PAnsiChar.
There is no conversion from WideString to PAnsiChar but compiler must generate error. This is solved then PChar is PWideChar. You existing API calls work.
Oliver
12|May|2008 16Very good example. If you have this code in your project(s) you’re already in trouble. You write:
Now,
GetProcAddressis a function that has no Unicode equivalent because the exported function names are always ANSI, actually ASCII (i.e. characters up to ordinal value 127), if I recall correctly.Nope, it won’t. You just proved my point, by giving a practical example as it is used out there
Oliver
12|May|2008 17By the way: I’ve always advocated using:
… adding the explicit cast to
PCharas you do will, in the worst case, shut up the compiler and do the wrong thing.Christian Wimmer
12|May|2008 18JWSCL users don’t have this problem. JWSCL converts the strings depending on the UNICODE directive.
Sebastian
12|May|2008 19@Oliver:
If you have the following code, which call to GetProcAddress will show a compiler warning, which call will show a compiler error?
// Delphi 2006
var
S: WideString;
begin
GetProcAddress(FHandle, @S[1]);
GetProcAddress(FHandle, PAnsiChar(S));
GetProcAddress(FHandle, PWideChar(S));
end;
What conclusion do you think am I going to draw from this?
Oliver
12|May|2008 20Oh, is this becoming a quiz now?
… my guess is, that you aren’t satisfied with the type checking in the “Delphi language” after your test and that you will migrate to a language that provides stricter type checking. Am I right?
The error is the third call to
GetProcAddress, becauseGetProcAddressdoesn’t take a wide character string for the function name. The second call gives a warning (and rightly so, because it has the same result as the first call). The first call doesn’t give a warning. Read further to find out why.So thanks for making my point about the strictness of type checking in Delphi versus C++!
To show you how I look at things most of the time, a little trip into a binary (console) which contains the first two calls as given by you:
Comments (except the first) are from me. I also shortened the lines, after copying them from IDA, for the sake of brevity. So what conclusion do you draw from this little excerpt of assembly? Let me give you mine:
{$T+}) would change a thing, think again.My guess is, what you were trying to say is, that
@S[1]is worse thanPCharType(S). So I take it you’re considering@S[1], which is perfectly valid code, as wrong or so and there should be a different treatment involving intransparent compiler magic? Or maybe your point was to show that you can make the compiler shut up despite formally wrong code (passing wide character string to ANSI-only function)? Please, unless you own your copy of the professional versions of IDA, download IDA Standard Free 4.9 and look a bit under the hood for yourself.Let me give you an example in return which doesn’t involve pointer arithmetics. Before Tiburon (
System.pas), Delphi 2006, to be exact:PVarRec = ^TVarRec; TVarRec = record { ... } case Byte of vtInteger: (VInteger: Integer; VType: Byte); vtBoolean: (VBoolean: Boolean); vtChar: (VChar: Char); vtExtended: (VExtended: PExtended); vtString: (VString: PShortString); vtPointer: (VPointer: Pointer); vtPChar: (VPChar: PChar); vtObject: (VObject: TObject); vtClass: (VClass: TClass); vtWideChar: (VWideChar: WideChar); vtPWideChar: (VPWideChar: PWideChar); vtAnsiString: (VAnsiString: Pointer); vtCurrency: (VCurrency: PCurrency); vtVariant: (VVariant: PVariant); vtInterface: (VInterface: Pointer); vtWideString: (VWideString: Pointer); vtInt64: (VInt64: PInt64); end;Uh oh. So I guess since
PCharwas never intended to be only ANSI. I am the only misguided Delphi-developer in the whole wide world in that I didn’t get thatPCharwas meant to be a “conditional type”. So the redPCharabove will implicitly meanPWideCharwith Tiburon and CodeGear will introduce a new variant sub-typevtPAnsiCharforPAnsiChar?! Or maybe the implementer of that part ofSystem.pasmade exactly the same wrong assumption as a big part of the Delphi community and will get bitten by it?! I’d really love to get a sneak-peek into this piece of code in a beta of Tiburon …But the really interesting part will be structures with dynamic offsets where, the implementer chose
PCharto do his pointer arithmetics, because before Tiburon, only some types allowed the implicit pointer arithmetic (withoutIncandDec), as pointed out in Allen’s post. TryPByte, for example, which would be the logical substitute - doesn’t even work in Delphi 2006, although I was of the impression they had fixed that before. Apparently not.I am not complaining about the parts where the compiler complains - they can and will be fixed, because they are visible through the compiler warnings. Of course you can always make the compiler shut up, but if you do so deliberately, bear the consequences! What I am complaining about is the fact that CodeGear wasn’t clear about the fact that
PCharis (and always was) meant as a “conditional type” and that CodeGear/Borland themselves were usingPCharin contexts were it was clearly ANSI-only. This, in combination with advanced use of Delphi is an explosive mix.Here’s an example:
program UTest; {$APPTYPE CONSOLE} uses SysUtils, Windows; {$DEFINE MIMICTIBURON} {$IFDEF MIMICTIBURON} type XString = System.WideString; type XPChar = System.PWideChar; {$ELSE} type XString = System.AnsiString; type XPChar = System.PAnsiChar; {$ENDIF} type PMyRecord = ^TMyRecord; TMyRecord = record SomethingElse : LongWord; OffsetToString : LongWord; end; const hex_offsets : array[0..$F] of Byte = (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 8, 8, 8, 8, 8, 8); var x : array[0..$1000-1] of Byte; ppmr : PMyRecord; pac : XPChar; i : Integer; begin {$IFDEF MIMICTIBURON} Writeln(’Mimicry of Tiburon with old code’); {$ELSE} Writeln(’Old code on old Delphi’); {$ENDIF} ppmr := PMyRecord(@x); // Fill buffer … for i := Low(x) to High(x)-1 do begin x[i] := Ord(’0′) + (i mod 16) + hex_offsets[(i mod 16)]; end; // Zero terminate the “string” x[477] := 0; // Prepare offset ppmr^.OffsetToString := 411; Writeln(’Record address : ‘, Format(’%p’, [ppmr])); Writeln(’Offset to string : ‘, ppmr^.OffsetToString); pac := XPChar(ppmr) + ppmr^.OffsetToString; Writeln(’Address of string: ‘, Format(’%p’, [Pointer(pac)])); end.And here’s the output with and without MIMICTIBURON defined:
Please note, that neither one of the two builds gave a warning!
Fun-fact at the end. Had CodeGear introduced the subscript operator and implicit pointer arithmetic before Tiburon, things wouldn’t be as bad either, because people would not have resorted to
PCharfor offset calculations, but rather toPWhateverType. Now they introduce both, break the logic, but don’t break the code. This means they break it half-heartedly.Sebastian
12|May|2008 21How come you complain that “PChar(somestring) + 1″ silently breaks if “somestring” is not really a string but an offset, but at the same time you suggest using “someapifunction(@bla[1])” which breaks silently, too?
Are you sure that it is impossible to find an example that would silently break if PChar was changed to PAnsiChar?
How come you think it would be better to break thousands of lines of code just in order to avoid breaking a corner case problem? Apart from that there is an easy workaround: Do a global search and replace from PChar to PAnsiChar.
Oliver
12|May|2008 22It’s a pointer. The 1 is the offset. It seems, from your statement (”is not really a string”), that your experiences with pointers are very limited.
Yes I do. Oh wait, so you blame me for the fact that CodeGear’s “compiler magic” breaks the the well-famed strict type checking? Hehe, that’s funny.
… yeah, shoot the messenger. Still a popular theme.
Yes. 100% sure for Win32 development (i.e. ignoring Kylix, which I haven’t ever tried). Because there was simply only one meaning for
PAnsiCharandPCharand both were equal.Declaring pointer arithmetics a corner case is a ridiculous statement at best. Apart from that, how big are the projects you’re working on? I had to smirk, when I read over your sentence the first time … now I am left to frown.
Sure … which will break exactly because of the change in Tiburon.
Prophet
07|Aug|2008 23God. I love reading posts where coders argue and stress over things that most people dont care about, dont know about and would not understand even if it was explained in detail.
I dont claim to understand all that is said. Infact, learned a thing or two from all this.
Lovely stuff.
Leave a reply
Search
Paypal donation (EUR)
Categories
Most Viewed
Archives
Tags
ACL callback COM Conversion CreateProcess DACL Delphi dialog DidYouKnow DLL documentation Download elevation factory file Handle header HowTo interface JWA JWSCL Kernel Microsoft KillProcess Laptop mail mailinglist manifest permission Privilege Process ProcessExplorer RunEl Russinovich Service Setup Sid TerminateProcess Theme Thread Token UAC user Vista Window WindowsRecent Posts
Recent Comments
Blogroll
Pages
Meta
A design creation of Design Disease
Copyright © 2007 - JEDI Windows API - is proudly powered by WordPress
InSense 1.0 Theme by Design Disease brought to you by HostGator Web Hosting.