SSstring.txt - 8/1/87 SSstring 5.0 - A Subset and Superset of the STL string class Table of Contents ===================================================== 1.0 - Introduction 2.0 - The STL (Standard Template Library) String Class 3.0 - The sstring Class 4.0 - The Derived Class - kstring 5.0 - Files and Stuff 6.0 - General Information on sstring 7.0 - Public Members of sstring 8.0 - Public Member Functions of kstring 9.0 - Protected Members of sstring 10.0 - Memory Allocation Policy 1.0 - Introduction ================== Computer programming requires the manipulation of string variables. C and C++ programmers have traditionally used char * or char arrays with the assistance of string.h to manupulate string expressions. This method is cumbersome and error prone. Moreover, the functionality of string.h can be described as adequate at best. C++ can greatly resolve the problem with a "string" class. A variety of string classes have been developed by individuals and companies. The problem has been the lack of standardization. Now enters the ISO/ANSI C++ standard STL string class. Problems: availability, complexity, and shortcommings. Solution: SString. SSstring is a subset and superset of the STL string class. SSstring implements two classes: sstring - A 90% subset of the standard C++ STL string class as defined in the December 1996 draft of the ISO/ANSI C++ standard. sstring does not use templates. sstring only supports of strings containing characters of type "char". kstring - A derived class of sstring (or the standard string class) that adds features such as conversions between numeric and string types, token extraction and string formatting. The sstring class is intended for users who do not have access to the standard string class (Turbo C++ users) or for users who want a non-template string class with source code (works with precompiled headers of Borland C++). sstring, provides most of the conveniences of the STL and removes some of the problems and complexity. kstring is a superset of the sstring or the string classes. It is the recommended string class to use. kstring has capability of the standard string class plus some very useful bells and whistles. 2.0 - The STL (Standard Template Library) String Class ====================================================== The Standard Template Library (STL) provides a powerful method of dealing with strings. C-style strings are powerful, but they require control of many low-level items such as allocation sizes and pointer offsets. The STL string class is designed to increase the power available above that of C-strings, but without need for low-level concerns. Essentially the string class makes the programmers job easier. The syntax governing this class is simple and easy to use. Furthermore, the syntax is consistent with the STL function calls. There are numerous resources on the Web describing the Standard Template Library - a great starting point is http://www.sgi.com/Technology/STL/ The December 1996 draft of the ISO/ANSI C++ standard can be found at http://www.cygnus.com/misc/wp/ for info on standard. 3.0 - The sstring Class ======================= The sstring is a 90% subset of the standard C++ STL string class as defined in the December 1996 draft of the ISO/ANSI C++ standard. The sstring class provides a mechanism for manipulting strings. The manipulations are intuitive and easy to understand. For example, the source code to compare strings can be written as if comparing integers, because familiar comparison operators (such as ==, != ) are provided. An explicit function call is not needed to perform the comparison. Below is some intuitively obvious sstring code: sstring s1("Hello"); sstring s2("World"); sstring s3; s3 = s1 + " " + s2; cout << s3 + "!" << endl; This class allocates all memory and provides an explicit variable to track the sstring length. Thus, sstring length can be looked up in constant time, which is efficient for many sstring computations. This class handles memory allocations and knows the allocated size. Memory is allocated in blocks (default size is 16 bytes). For example, a sstring of length 12 is allocated 16 bytes of memory; a sstring of length 20 is allocated 32 bytes. Operations are checked against the current size, and the allocated size is decreased or increased when necessary. sstring supports strings of characters of type "char" only. It is a non-template subset od the STL string class. The sstring class automatically provides the terminating NULL character. 4.0 - The Derived Class - kstring ================================= The STL string class and the sstring class are great, but they lack some fundamental features. kstring is a derived class of sstring (or the standard string class) that adds features such as conversions between numeric and string types, token extraction and string formatting. For example the statement: ok = line.token(1).cvt_double(inches); converts the 1th token of line to a double and places the answer in inches. The bool ok is set to true of the conversion worked. kstring is a superset of the sstring or the string classes. It is the recommended string class to use. kstring has capability of the standard string class plus some very useful bells and whistles. 5.0 - Files and Stuff ================================ The files implementing sstring and kstring classes are named ssstring.h and ssstring.cpp. Sample code demonstrating sstring and kstring are included in: ss_dem1.cpp - a simple SSstring demo ss_dem2.cpp - a comprehensive demonstration SSstring. ss_demw.cpp - a Borland C++ OWL windows demo of SSstring SSstring is FREEWARE. SSstring was developed by : Tom Kisko - Associate in Industrial and Systems Engineering Wesley Day - Graduate student in Industrial and Systems Engineering Contact kisko@ise.ufl.edu to report bugs or make suggestions. THIS WORK IS PROVIDED ON AN "AS IS" BASIS. THE AUTHOR PROVIDES NO WARRANTY WHATSOEVER, EITHER EXPRESS OR IMPLIED, REGARDING THE WORK, INCLUDING WARRANTIES WITH RESPECT TO ITS MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. 6.0 - General Information on sstring ==================================== sstring objects consist of 0 to npos characters. npos defaults to 5000; change via sstring.h. npos is used in current literature to represent "no position specified" Elements of sstring are numbered starting at 0, e.g., s[0] is the first element of the sstring, s. In any sample code below, presume identifiers that start with: s are of type sstring k are of type kstring cp are of type char * ch are of type char n, i, j are of type unsigned typedef size_t unsigned; // used also in current literature The following global definitions are in sstring.h typedef char bool; enum {false, true}; 7.0 - Public Members of sstring =============================== /////////////////// // Constructor functions /////////////////// sstring( ) default constructor; creates a zero length string. e.g., sstring s1(); // s1 contains "" sstring (const char * cp) construct a sstring from cp. e.g., sstring s2("ABCDEF"); // s2 contains "ABCDEF" sstring(const char * cp, size_t n) construct a sstring from cp taking a maximum of n characters or the length of the sstring. e.g., sstring s3("12345678",6); // s3 contains "123456" // given cp points to "abc123"; sstring s4(cp,8); // s4 contains "abc123" sstring(const sstring & s) constructor with a sstring as an initial value. referred to as the "copy constructor". e.g., sstring s5(s3); // s5 contains "123456" sstring(const sstring & s, size_t pos, // optional starting character size_t n_chars = npos) // optional number of characters construct a sstring from s starting at location pos (first location = 0) and continuing for the length of the sstring or for n_chars characters, whichever comes first. e.g., sstring s6(s5,0,3); // s6 contains "123" sstring(char ch, size_t n = 1); construct a sstring consisting of n copies of the character ch. e.g., sstring s7('*',5); // s7 contains "*****" sstring s8('b'); // s8 contains "b" /////////////////// // Destructor /////////////////// ~sstring(void) the destructor. /////////////////// // = operators /////////////////// sstring& operator = (const sstring & rhs) copy a sstring. e.g., s1 = s2; // s1 contains "ABCDEF" sstring& operator = (const char * rhs); set a sstring equal to a C-style character string pointed to by rhs. e.g., s1 = cp; // s1 contains "abc123" sstring& operator = (const char rhs) set a sstring equal to a character. e.g., s1 = 'c'; // s1 contains "c" /////////////////// // Comparison operators - case dependent /////////////////// friend bool operator == (const sstring & lhs, const sstring & rhs); friend bool operator == (const sstring & lhs, const char * rhs) friend bool operator == (const char * lhs, const sstring & rhs) friend bool operator != (const sstring & lhs, const sstring & rhs) friend bool operator != (const sstring & lhs, const char * rhs) friend bool operator != (const char * lhs, const sstring & rhs) friend bool operator > (const sstring & lhs, const sstring & rhs) friend bool operator > (const sstring & lhs, const char * rhs) friend bool operator > (const char * lhs, const sstring & rhs) friend bool operator < (const sstring & lhs, const sstring & rhs) friend bool operator < (const sstring & lhs, const char * rhs) friend bool operator < (const char * lhs, const sstring & rhs) friend bool operator <= (const sstring & lhs, const sstring & rhs) friend bool operator <= (const sstring & lhs, const char * rhs) friend bool operator <= (const char * lhs, const sstring & rhs) friend bool operator >= (const sstring & lhs, const sstring & rhs) friend bool operator >= (const sstring & lhs, const char * rhs) friend bool operator >= (const char * lhs, const sstring & rhs) e.g., s2 = "ABC"; s3 = "BC"; s5 = "123456"; s6 = "ZZZ"; cp="XYZ"; s7="abc"; // if (s5 != s6) results in true // if (s2 == "ABC") results in true // if (s2 < s3) results in true - A is less than B // if (s2 > s3) results in false // if (cp < s6) results in true - X is less than Z // if (s2 > s7) results in false - A is less than a a /////////////////// // Concatenating strings /////////////////// sstring& operator += (const sstring & rhs) concatenate rhs to a sstring. Memory is allocated and deleted as required. e.g., s1 = "c"; s2 = "ABCDEFG"; s1 += s2; // s1 contains "cABCDEFG" sstring& operator += (const char * rhs) concatenate the C-string to a sstring. See note on memory above. e.g., s1 = "cABCDEFG"; cp [] = "abc123"; s1 += cp; // s1 contains "cABCDEFGabc123" sstring& operator += (const char rhs) concatenate the character to a sstring. See note on memory above. e.g., s1 = "cABCDEFGabc123"; s1 += 'x'; // s1 contains "cABCDEFGabc123x" friend sstring operator + (const sstring &lhs, const sstring &rhs) friend sstring operator + (const sstring &lhs, const char * rhs) friend sstring operator + (const char * lhs, const sstring &rhs) concatenate rhs to a sstring. lhs and rhs retain original data. e.g., s2 = "ABC"; s4 = "123"; c = 'W'; // cp points to "hey you" s2 + s4; // returns "ABC123" s4 + s2; // returns "123ABC" s2 + c; // returns "ABCW" c + s2; // returns "WABC"; cp + s2; // returns "hey youABC" s2 + cp; // returns "ABChey you" /////////////////// // Inserting characters into a sstring /////////////////// sstring& insert(size_t startInsertAt, const sstring& s, size_t startIndex = 0, size_t numChars = npos); at position startInsertAt in the implicit sstring, insert s beginning at startIndex for numChars characters. sstring& insert(size_t startInsertAt, const char* cp) at position startInsertAt in the implicit sstring, insert cp. sstring& insert(size_t startInsertAt, const char* cp, size_t numChars) at position startInsertAt in the implicit sstring, insert cp for numChars characters. sstring& insert(size_t startInsertAt, char ch, size_t numChars = 1) at position startInsertAt in the implicit sstring, insert ch characters numChars times. e.g., s2 is "ABCDEF"; s5 is "123456"; // cp points to "boy wonder" s2.insert(1,s5,1,3); // returns s2 = "A234BCDEF" s2 is now "A234BCDEF" s2.insert(4,cp,3); // returns s2 = "A234boyBCDEF" s2 is now "A234boyBCDEF" s2.insert(2,cp); // returns s2 = "A2boy wonder34boyBCDEF" s2 is now "A2boy wonder34boyBCDEF" s2.insert(1,'V',3); // returns s2 = "AVVV2boy wonder34boyBCDEF" /////////////////// // Removing characters within a sstring /////////////////// sstring& remove(size_t startIndex = 0, size_t numChars = npos) removes characters and collapses the sstring; starting at startIndex and continuing for numChars or till the end of the sstring. The January 96 working papers rename this erase. e.g., s1 = "ABCDEF"; s4 = "xyx 123 abc"; s1.remove(2,3); // returns "ABF" s4.remove(3,2); // returns "xyz23 abc" s1.remove(0,1); // returns "BF" s4.remove(4,3); // returns "xyz2bc" // What if you specify to remove more characters which are in the // sstring? The function adjusts for the error. s4.remove(3,300); // returns "xyz" // What if you specify a startIndex greater than the length of the // sstring? The function adjusts the startIndex back to the // beginning element. s1.remove(30,1); // returns "F" /////////////////// // Replacing characters within a sstring /////////////////// sstring& replace(size_t removeFrom, size_t removeCount, const sstring& s, size_t startReplacePosition = 0, size_t replaceCount = npos) replace invokes the following functions. remove(removeFrom, removeCount); insert(removeFrom, s, startReplacePosition, replaceCount); sstring& replace(size_t removeFrom, size_t removeCount, const char* cp) replace invokes the following functions. remove(removeFrom, removeCount); insert(removeFrom, cp); sstring& replace(size_t removeFrom, size_t removeCount, const char* cp, size_t replaceCount) replace invokes the following functions. remove(removeFrom, removeCount); insert(removeFrom, cp, replaceCount); sstring& replace(size_t removeFrom, size_t removeCount, char ch, size_t numChars = 1) replace invokes the following functions. remove(removeFrom, removeCount); insert(removeFrom, ch, numChars); e.g., s5 is "12345678"; s4 is "xyz"; // cp points to "boy wonder" s5.replace(1,5,s4,0,3); // returns s5 = "1xyz78" s5 is now "1xyz78" s5.replace(2,2,s4,5); // returns s5 = "1xxyz78" s5 is now "1xxyz78" s5.replace(0,3,cp); // returns "boy wonderyz78" /////////////////// // Addressing individual characters within a sstring /////////////////// char sstring::operator[ ](size_t pos) const return the pos'th character; s[0] is 1'st char. e.g., s0 = "WABCXYZW"; s0[0] = 'W' s0[1] = 'A' s0[2] = 'B' s0[4] = 'X' s0[7] = 'W' s0[8] = '' // NULL terminator! char & sstring::operator[ ](size_t pos); return a reference to the pos'th character. e.g., s0 = "WABCXYZW" s0[0] = 'x'; // changes s0 to "xABCXYZW" s0[1] = 'y'; // changes s0 to "xyBCXYZW" s0[2] = 'z'; // changes s0 to "xyzCXYZW" /////////////////// // Getting a substring from within a sstring /////////////////// sstring substr(size_t startIndex, size_t numChars) const; return a substring starting at startIndex and continuing for numChars characters or till the end of the sstring. e.g., s2 = "ABC123DEF456"; s2.substr(2,3); // returns "C12" s2.substr(0,3); // returns "ABC" s2.substr(3,2); // returns "12" // What if the startIndex specified is greater than the sstring // length? The function returns an empty sstring. s2.substr(300,3); // returns "" // What if numChars specified is greater than the remaining length // of the sstring? The function adjusts and removes all characters // up to the NULL terminator. s2.substr(3,20); // returns "123DEF456" /////////////////// // Searching within a sstring /////////////////// size_t find(const sstring & s, size_t startIndex = 0) find the first location of s in a sstring starting at position startIndex. The location is relative to the beginning of the parent sstring. size_t find(const char * cp, size_t startIndex = 0) find the location of cp in a sstring starting at position startIndex. The location is relative to the beginning of the parent sstring. size_t find(char ch, size_t startIndex = 0) find the location of ch in a sstring starting at position startIndex. The location is relative to the beginning of the parent sstring. e.g., sstring s3 = "mississippi"; s6 = "ss"; s3.find("ss",0); // returns 2 s3.find("ss",2); // returns 5 s3.find(s6,3); // returns 5 s3.find('i',5); // returns 7 /////////////////// // Length of the sstring /////////////////// size_t length(void) returns the length of the sstring - does not include NULL terminator. e.g., s1 = "ABCDEF"; s4 = "xyz 123 abc"; s1.length(); // returns "6" s4.length(); // returns "11" /////////////////// // Copying to a C-style sstring /////////////////// size_t copy(char * cp, size_t numChars, size_t startIndex) const; the implicit sstring starting at startIndex is copied into cp for numChars characters or until \0 is reached; returns the number of characters copied e.g., s3 = "ABCDEFG"; cp points to "xyz" s3.copy(cp,2,3); // cp = "DEz", s3 remains "ABCDEFG" s3.copy(cp,2,1); // cp = "BCz", s3 remains "ABCDEFG" /////////////// // A char pointer to the contents of the sstring /////////////// const char * c_str(void) const; returns a constant pointer to a C-style character array. e.g., f.open( s1.c_str() ); // c_str() is a pointer to the contents of s1 s3.str_ptr = "BC" s3.c_str() returns a pointer to BC /////////////////// // I/O operations /////////////////// friend ostream& operator << (ostream &st, const sstring &s); read token from istream friend istream& operator >> (istream &st, sstring &s2); output a sstring istream& getline(istream& st, sstring& s, char ch = '\n'); read a line until the specified token ch => token is then flushed from the stream 8.0 - Public Member Functions of kstring ======================================== /////////////////// // Inherited constructors /////////////////// kstring contains 9 constructor functions - 6 inherited from sstring plus 3 unique to kstring. e.g., kstring k0; // k0 = "" kstring k1("wxyz"); // k1 = "wxyz" kstring k2(k1); // k2 = "wxyz" kstring k3(k1,1,2); // k3 = "xy" kstring k4("12 345 6789"); // k4 = "12 345 6789" kstring k5("12345678",4); // k5 = "1234" kstring k6('s',2); // k6 = "ss" /////////////////// // additional constructors /////////////////// kstring(long val, size_t width = 0, bool comma_fmt = false, char pad_ch = ' '); constructor with long as an init val; width=0 means no padding; if comma_fmt is set to true, commas are inserted into val; pad_ch is the padding character; e.g., val_long is a long variable = "12345" val_long2 is a long variable = "-1234567890" kstring k7(val_long); // k7 = "12345" kstring k8(val_long,8,false,'*'); // k8 = "***12345" kstring k43(val_long,0,true); // k43 = "12,345" kstring k44(val_long2,13); // k44 = " -1234567890" kstring k45(val_long2,0,true); // k45 = "-1,234,567,890" kstring k46(val_long2,18,true,'$'); // k46 = "$$$$-1,234,567,890" What if you specify the arguments in the wrong order? It's hard to generalize and the results are unpredictable. The next line of code attempts to put commas into val_long -> Because of the signature, true is interpreted as the width - of 1! kstring k47(val_long,true); // k47 = "12345" kstring(double val, size_t d = 2, size_t width = 0, char pad_ch = ' '); constructor with double as an initial val; d decimal digits; width = 0 means no padding; pad_ch is the padding character; e.g., val2 is a double variable = "123456.12345" kstring k3(val2); // returns k3 = "123456.12" kstring k4(val2,3); // returns k4 = "123456.123" kstring k5(val2,3,12,'*'); // returns k5 = "**123456.123" kstring k6(val2,3,12); // returns k6 = " 123456.123" kstring k7(val2,5,17,'$'); // returns k7 = "$$$$$123456.12345" kstring(const sstring & s); constructor with a base class (sstring object) as an initial value. an object of the derived class (kstring) can be assigned the value of the base class (sstring) ONLY if there is a constructor function in the derived class which accepts the base class (a sstring object) as an argument. e.g., s48 = "xyz" kstring k48(s48); // k48 = "xyz" kstring k52 = s48; // k52 = "xyz" /////////////////// // constructing kstring from sstring /////////////////// In object oriented programming, an object of the base class (sstring) can always be constructed from the value of a derived class (kstring). The copy constructor in the base class is used to perform the construction. e.g., k50 = "abc" sstring s45(k50); // s45 = "abc" sstring s46 = k50; // s46 = "abc" /////////////////// // superset functions /////////////////// /////////////////// // kstring tokens /////////////////// int num_tokens() returns the number of tokens in kstring or 0 if none are found; the kstring "University of Florida in Gainesville" contains 5 tokens; returns the n'th token; n = 0 for first token; if to_eos is true, returns the remainder of the string, starting at token n e.g., k8 = "I wish you were here" k8.token(0); // returns "I" k8.token(1); // returns "wish" k8.token(3); // returns "were" k8.token(4); // returns "here" kstring token(int n, bool to_eos = false) returns the number of tokens in a kstring or 0 if none are found. e.g., k8 = "I wish you were here" k8.num_tokens(); // returns 5 /////////////////// // convert kstring to a number /////////////////// bool cvt_long(size_t & i, long & val) converts value at position i to a size_t number; updates i to char after the number; returns true if there was a number. e.g., val3 is a long variable; i = 7 is a size_t variable // k9 = "I have 15000 dollars" k9.cvt_long(7,val3); // val3 = 15000, i = 13 bool cvt_long(long & val) converts first nonblank characters in the kstring to a long number; returns true if there was a number. e.g., val3 is a long variable; // k10 = "7000" k10.cvt_long(val3); // val3 = 7000 bool cvt_double(double & val) converts first nonblank chars in the kstring to a number; returns true if there was a number. e.g., val4 is a double variable. // k11 = "4000" k11.cvt_double(val4); // val4 = 4000 bool cvt_double(size_t & i, double & val) converts the value at position i to a long number; updates i to char after the number; returns true if there was a number. e.g., val4 is a double variable; i = 7 is a size_t variable // k12 = "I have 80000 dollars" k12.cvt_double(7,val4); // val4 = 80000, i = 13 /////////////////// // kstring formatting /////////////////// void trim_trail() trims the trailing spaces in a kstring. e.g., k13 = " leading blanks " k13.trim_trail(); // returns k13 = " leading blanks" void trim_lead() trims the leading spaces in a kstring. e.g., k13 = " leading blanks" k13.trim_lead(); // returns k13 = "leading blanks" void trim() trim leading and trailing spaces in a kstring. e.g., k14 = " blanks on both " k14.trim(); // returns k14 = "blanks on both" size_t pad(size_t width, char pad_ch= ' ', int side = ON_LEFT) pads the kstring, either ON_LEFT or ON_RIGHT, with character ch; pad does nothing if width <= length of the kstring. e.g., k15 = "ABC" k15.pad(6,'=',ON_RIGHT); // returns k15 = "ABC===" // k16 = "12345" k16.pad(10,'+'); // returns k16 = "+++++12345" // k17 = "abcde" k17.pad(7); // returns k17 = " abcde" // k17 = " abcde" k17.pad(12,$,ON_RIGHT); // returns k17 = " abcde$$$$$" void l_just(size_t n) left justifies the first non-blank char in a kstring to a position of n. e.g., k18 = "I wish you were" k18.l_just(2); // returns k17 = " I wish you were" k18.l_just(5); // returns k18 = " I wish you were" void r_just(size_t n) right justifies the kstring to a width of n. e.g., k19 = "xyz" k19.r_just(4); // returns k19 = " xyz" k19.r_just(5); // returns k19 = " xyz" k19.r_just(6); // returns k19 = " xyz" void center(size_t n) center the kstring at position n. e.g., k20 = "123" k20.center(5); // returns k20 = " 123" k20.center(6); // returns k20 = " 123" k20.center(7); // returns k20 = " 123" k21 = "abcd" k21.center(6); // returns k21 = " abcd" k21.center(7); // returns k21 = " abcd" k21.center(8); // returns k21 = " abcd" /////////////////// // kstring math operations /////////////////// void cvt_to_cartesian(double &x, double &y) convert kstring containing "dd:mm:ss and distance" to cartesian coordinates. e.g., k23 = "30:00:00 1.0", x and y are double variables k23.cvt_to_cartesian(x,y); // returns x = 0.866025, y = 0.5 // k24 = "45 10.0" k24.cvt_to_cartesian(x,y); // returns x = 7.071068, y = 7.071068 // k25 = "60 1.0" k25.cvt_to_cartesian(x,y); // returns x = 0.5, y = 0.866025 /////////////////// // convert case /////////////////// void to_upper() convert kstring to uppercase. e.g., k26 = "abcdef" k26.to_upper; // returns k26 = "ABCDEF" void to_lower() convert kstring to lowercase. e.g., k27 = "ABCD" k27.to_lower(); // returns k27 = "abcd" 9.0 - Protected Members of sstring ================================== char * str_ptr pointer to the current string. size_t len length of the string. size_t res current reserved area in memory - always divisible by block_size. static const size_t block_size smallest unit of allocated memory - 16 bytes for this class. static const size_t read_buffer_size smallest unit of memory allocated to read a string from the keyboard static const size_t npos no position specified - why npos? it's used in the current C++ literature 10.0 - Memory Allocation Policy =============================== Dynamic memory, or memory created and destroyed during program execution, is handled by the class. No memory management is required by the user. Memory allocation for this sstring class employs a "current sstring size allocation algorithm". Memory is allocated in block sizes of 16 bytes. For example, a string with a length of 8 is allocated 16 bytes of memory. A string of length of 14 is allocated 16 bytes; a string of length of 20 is allocated 32 bytes; a string of length of 36 is allocated 48 bytes. If the size of the sstring should increase, the allocation size is adjusted accordingly. If the size of the sstring should decrease, the allocation size is adjusted accordingly. If the sstring changes requiring the same block size, no action is taken. Furthermore, to prevent memory leaks, any deletion of old memory is handled by the class. The 16 byte block size can be adjusted - it was chosen because that is the memory allocation quantum of Turbo C++ 3.0.