Advanced Strings In Cocoa
Posted 10/12/2008 - 21:53 by Cocoacast Pro
I've been asked many times how Strings work on Mac and iPhone and why certain things work the way the do. It's not difficult, but there is a lot, so I decided to organize my thoughts into this little document. Perhaps, it will even help someone. Please let me know if it does. [Vlad]
Introduction
In my 15 years as a software developer, I did not write a single program that did not have to parse, search, or modify strings. As mundane as his subject may sound, it is one of the cornerstones of software development.
Today, there are libraries that will do almost anything for us. For example, we could just borrow open-source Google code to handle ATOM feeds, etc. Then why do we need to know all this? Very often, we don’t, but when we do, the problem can be very complicated and we need to be prepared to handle it. Libraries are written against the standards and there are often situations when we have to deal with deviations from standards. One of my clients had created an XML-RPC service, but was putting non-standard dates in it. Another, wanted to remove specific HTML formatting from its RSS feeds, but in such a way that it would not significantly degrade the performance of the program. Also, what if you needed to format dates, extract paragraphs, create summaries from an arbitrary content, and more? My goal is to help you to prepare to handle those problems and to show you where to look for information.
The beauty of Cocoa is that it is totally interoperable with native C libraries. This is important for a couple of reasons:
- it gives us access to the Standard C library functions
- certain operations on strings are more efficient when done natively
The C Library brings us additional functionality, such as regular expression handling. It is also useful when simple pointer arithmetics needs to be used to improve the performance of a program, for example scanning a string for a given character. While native functions can be used with Unicode characters, Unicode operations are by no means trivial. Cocoa brings its own set of classes, NSString and NSMutableString, to the table. These classes make unicode support transparent to the user and provide simple conversions to and from various formats. This is important, because the content received from the internet, such as RSS feeds, may be supplied in a variety of formats and we have to be able to process all that data.
A thing to remember: native C strings are NULL-terminated, whereas Cocoa strings are not.
NSString and NSMutableString
NSString is an immutable string wrapper. What it means, is that once an NSString object has been instantiated it cannot be modified.
NSMutableString, as the name suggests, is a mutable string wrapper, i.e. it can be modified art any time. We can remove characters, replace them, add others.
In Cocoa, mutable classes derive from immutable. This allows to cast NSMutableString down to NSString. This has a lot of advantages. For example every mutable string inherits all the properties of immutable. At the same time it leads to a lot of confusion. You need to constantly watch for these situations. Here is one such example:
NSLog(@"%@:%@", [str class], str);
[str appendString:@"789"];
NSMutableString* str1 = [[str copy] autorelease];
NSLog(@"%@:%@", [str1 class], str1);
[str1 appendString:@"789"];
What is wrong with the above code? We have created an immutable copy of str, and will be trying to append a string to it. Here is what will be printed to the console. Note that both mutable and immutable instances show as NSCFString class.
2008-10-10 20:12:57.943 XMLTests[1634:813] NSCFString: 123456 Current language: auto; currently objective-c 2008-10-10 20:13:00.089 XMLTests[1634:813] NSCFString: 123456 789 2008-10-10 20:13:00.635 XMLTests[1634:813] *** Terminating app due to uncaught exception 'NSInvalidArgumentException', reason: 'Attempt to mutate immutable object with appendString:'
For the complete reference of NSString and NSMutableString please refer to Apple’s documentation, particularly to the String Programming Guide for Cocoa.
Encodings
Similarly to many other languages, such as Java or C#, strings in Cocoa are implemented as arrays of Unicode characters. NSString objects can be converted to and from various other kinds of strings:
– initWithBytesNoCopy:length:encoding:freeWhenDone:
– initWithCharacters:length:
–
initWithCharactersNoCopy:length:freeWhenDone:
– initWithString:
– initWithCString:encoding:
– initWithUTF8String:
+ stringWithCharacters:length:
+ stringWithString:
+ stringWithCString:encoding:
+ stringWithUTF8String:
– getBytes:maxLength:usedLength:encoding:options:range:remainingRange:
– cStringUsingEncoding:
– getCString:maxLength:encoding:
– UTF8String
+ availableStringEncodings
+ defaultCStringEncoding
+ localizedNameOfStringEncoding:
– canBeConvertedToEncoding:
– dataUsingEncoding:
– dataUsingEncoding:allowLossyConversion:
– description
– fastestEncoding
– smallestEncoding
C-strings are the ANSI-C strings that are traditionally used in C programs. We will revisit them once we start looking at the standard C library. UTF8 just is one of the many Unicode encodings. The advantage of UTF8 standard is that UTF8 strings are compatible with C-strings in cases where wide character formatting is not required, i.e. for most latin alphabets, including English.
Here is the complete list of available encodings:
NSASCIIStringEncoding = 1,
NSNEXTSTEPStringEncoding = 2,
NSJapaneseEUCStringEncoding = 3,
NSUTF8StringEncoding = 4,
NSISOLatin1StringEncoding = 5,
NSSymbolStringEncoding = 6,
NSNonLossyASCIIStringEncoding = 7,
NSShiftJISStringEncoding = 8,
NSISOLatin2StringEncoding = 9,
NSUnicodeStringEncoding = 10,
NSWindowsCP1251StringEncoding = 11,
NSWindowsCP1252StringEncoding = 12,
NSWindowsCP1253StringEncoding = 13,
NSWindowsCP1254StringEncoding = 14,
NSWindowsCP1250StringEncoding = 15,
NSISO2022JPStringEncoding = 21,
NSMacOSRomanStringEncoding = 30,
NSUTF16BigEndianStringEncoding = 0x90000100,
NSUTF16LittleEndianStringEncoding = 0x94000100,
NSUTF32StringEncoding = 0x8c000100,
NSUTF32BigEndianStringEncoding = 0x98000100,
NSUTF32LittleEndianStringEncoding = 0x9c000100,
};
Examples:
- Creating an NSString:
- Converting a string to a NULL-terminated character array:
Operations on NSString-s
I will show the basic set of functions that most of us would need to deal with on a daily basis, such as sorting and searching.
Example. Extracting image URLs from HTML stream.
{
NSMutableArray *images = [NSMutableArray arrayWithCapacity:6];
NSAutoreleasePool *pool = [NSAutoreleasePool new];
@try
{
NSUInteger start = 0, length = [self length];
while (length > 0)
{
(1) NSRange range = [self rangeOfString:@"<img" options:NSCaseInsensitiveSearch range:NSMakeRange(start, length)];
if (range.location == NSNotFound)
{
break;
}
start = range.location + range.length;
length = length - start;
range = [self rangeOfString:@"src="" options:NSCaseInsensitiveSearch range:NSMakeRange(start, length)];
if (range.location == NSNotFound)
{
break;
}
start = range.location + range.length;
length = length - start;
range = [self rangeOfString:@""" options:NSCaseInsensitiveSearch range:NSMakeRange(start, length)];
if (range.location == NSNotFound)
{
break;
}
NSUInteger imageLength = range.location - start;
(2) NSString *image = [self substringWithRange: NSMakeRange (start, imageLength)];
start = range.location + range.length;
length = length - start;
[images addObject:image];
}
}
@catch (NSException *exception)
{
NSLog(@"error: Caught %@: %@", [exception name], [exception reason]);
}
@finally
{
[pool release];
}
return images;
}
In selection (1), we perform a case-insensitive search looking for a substring “<img” in the original string.
In selection (2), we are extracting a substring of a given length from the original string.
Question: What would we do if we just wanted to get a single character at a given location?
Answer:
To get individual characters, use:
- (void)getCharacters:(unichar *)buffer
(void)getCharacters:(unichar *)buffer range:(NSRange)aRange
Other things we can do with NSString
- Replace all percent escapes from an XML/HTML string
- Compare two strings for equality - selection (1) below. Also, we could use this function:
- Trim whitespaces off the string - selection (2) below.
if (s == nil)
return (YES);
(1) if ([s isEqualToString:@""])
return (YES);
(2) if ([[s stringByTrimmingCharactersInSet:[NSCharacterSet whitespaceCharacterSet]] isEqualToString:@""])
return (YES);
return (NO);
}
- String formatting
- Upper/Lower case conversions:
– capitalizedString
– lowercaseString
– uppercaseString
- Available number conversions
Example:
NSNumber *number = [NSNumber numberWithDouble:[str doubleValue]];
– doubleValue
– floatValue
– intValue
– integerValue
– longLongValue
– boolValue
- Date conversions
- Not available on iPhone
-
- Available on iPhone
[formatter setDateStyle:NSDateFormatterShortStyle];
[formatter setTimeStyle:NSDateFormatterShortStyle];
statusLabel.text = [NSString stringWithFormat:@"Updated %@",[formatter stringFromDate:[NSDate date]]];
@implementation NSString (NSDate)
- (NSDate *)rssDateValue
{
NSDateFormatter* dateFormatter = [NSDateFormatter new];
[dateFormatter setDateFormat:@"EEE', 'dd' 'MMM' 'yyyy' 'HH':'mm':'ss' 'zzz"];
NSDate* date = [dateFormatter dateFromString:self];
[dateFormatter release];
return date;
}
- (NSDate *)atomDateValue
{
NSDateFormatter* dateFormatter = [NSDateFormatter new];
[dateFormatter setDateFormat:@"yyyy-MM-dd HH:mm:ss"];
NSDate* date = [dateFormatter dateFromString:self];
if (date == nil)
{
[dateFormatter setDateFormat:@"yyyy-MM-dd'T'HH:mm:ss'Z'"];
date = [dateFormatter dateFromString:self];
if (date == nil)
{
[dateFormatter setDateFormat:@"yyyy-MM-dd'T'HH:mm:ss"];
date = [dateFormatter dateFromString:self];
}
}
[dateFormatter release];
return date;
}
- (NSDate *)xmlrpcDateValue
{
NSDateFormatter* dateFormatter = [NSDateFormatter new];
[dateFormatter setDateFormat:@"yyyyMMdd HH:mm:ss"];
NSDate* date = [dateFormatter dateFromString:self];
if (date == nil)
{
[dateFormatter setDateFormat:@"yyyyMMdd'T'HH:mm:ss'Z'"];
date = [dateFormatter dateFromString:self];
}
[dateFormatter release];
return date;
}
@end
Operations on NSMutableString-s
Once again, mutable strings are the ones that can be manipulated, for example, we can:
- Replace strings
[mutableStr replaceOccurrencesOfString:@" " withString:@"+" options:0 range:NSMakeRange(0, [mutableStr length])];
[result replaceCharactersInRange:replaceRange withString:replacement];
- Delete characters
- Append strings
[encoded appendFormat:@"%c", currChar];
Standard C Library
Interoperability between C and Objective-C brings a lot of important advantages to Mac programming:
- <string.h> functions. This brings us an ability to use functions such as strstr, strtok, and more.
- <glob.h> functions. Simple, shell-expansion-like pattern matching
globbuf.gl_offs = 2;
glob("*.c", GLOB_DOOFFS, NULL, &globbuf);
glob("../*.c", GLOB_DOOFFS | GLOB_APPEND, NULL, &globbuf);
globbuf.gl_pathv[0] = "ls";
globbuf.gl_pathv[1] = "-l";
execvp("ls", &globbuf.gl_pathv[0]);
The above code prints *.c in current and ../ dirs
- <regex.h> functions. There are no regular expressions wrappers in Cocoa. Srtandard C Library regex facility is very efficient and convenient for more sophisticated string searching and replacement. A number of popular wrappers are available, such as CSRegex, but even such wrappers may be inefficient if a lot of parsing is required. Here is an example of using <regex.h> functions. Note that pattern is an actual regular expression (@”[0-9]+”) and string is the actual string (@”123456789”).
NSString* pattern = @". .[0-9]";
NSString* result = nil;
regex_t preg;
int err=regcomp(&preg,[pattern UTF8String],REG_EXTENDED);
if(err)
{
char errbuf[256];
regerror(err,&preg,errbuf,sizeof(errbuf));
[NSException raise:@"CSRegexException"
format:@"Could not compile regex "%@": %s",pattern,errbuf];
}
const char *cstr=[string UTF8String];
regmatch_t match;
if(regexec(&preg,cstr,1,&match,0)==0)
{
result = [[[NSString alloc] initWithBytes:cstr+match.rm_so
length:match.rm_eo-match.rm_so encoding:NSUTF8StringEncoding] autorelease];
}
- regular C pointer arithmetics. In situations where it is necessary to iterate through a string character by character, using NSString may prove to be very expensive, especially in a constrained environment such as the iPhone
Example. Stripping images and iframes from HTML content.
char *g_discardMediaTags[] = {
"<img", "<iframe"
};
int g_discardMediaTagLengths[] = {
4, 7
};
#define TOTAL_DISCARD_MEDIA_TAGS 2
- (NSString*)stripMedia
{
NSString *result = nil;
const char *charString = [self UTF8String];
int len = strlen(charString);
char buffer[len + 1];
buffer[len + 1] = '';
char *pBuffer = buffer;
const char *start = charString;
const char *end = start + len;
BOOL done = NO;
while (done == NO)
{
int i = 0;
char *foundTag = NULL;
int foundTagLength = 0;
const char *minLoc = end;
for (i = 0; i < TOTAL_DISCARD_MEDIA_TAGS; i++)
{
char *tag = g_discardMediaTags[i];
char *loc = strcasestr(start, tag);
if (loc != NULL)
{
if (loc < minLoc)
{
minLoc = loc;
foundTag = tag;
foundTagLength = g_discardMediaTagLengths[i];
}
}
}
if (foundTag == NULL)
{
int left = end - start;
strncpy(pBuffer, start, left);
pBuffer += left;
*pBuffer = '';
done = YES;
}
else
{
if (minLoc != NULL && minLoc != start)
{
int left = minLoc - start;
strncpy(pBuffer, start, left);
pBuffer += left;
*pBuffer = '';
start = minLoc;
}
const char *endTag = NULL;
for (endTag = start + foundTagLength + 1; endTag < end; endTag++)
{
if (*endTag == '>')
{
start = endTag + 1;
break;
}
}
if (endTag == NULL || endTag >= end)
{
done = YES;
}
}
}
result = [NSString stringWithUTF8String:buffer];
return result;
}
@end
If you ever need to find information on functions that come with the C Library, your best resource, besides Google of course, is the good old man command (just type man regex, man strlen, ...).
Conclusions
The goal of this article was not to show you every function or class available, but rather to understand what is available, why, and where to find information if you need it. In addition to superb documentation that accompanies Xcode, there are tons of articles and examples available on the web, and a C library reference that can be found using the man command. And if you have more questions, please write to cocoacast@gmail.com.
Vladimir Pasman
Copyright 2008, Mesh Systems Inc.
donated to cocoacast.com
Friday, October 10, 2008





Thank you so much for this post. It came in very useful and would really be of great help. - muscle building supplements