Lab 9 - char*

We strongly encourage teamwork in this lab.

This lab will run as follows:

  1. Discuss char * strings
  2. Implement a variant of strlen
  3. Compare your implementation with that of another student
  4. Try to find a way to make strlen shorter/simpler, and note what you did in a comment
  5. Implement a variant of strtok
  6. Compare your implementation with that of another student
  7. Try to find a way to make strtok shorter/simpler, and note what you did in a comment
  8. Show a TA how you made each function smaller

Lab Goals

After this lab, you should:

  1. Gain familiariarity with C
  2. Understand how strings (char *) work in C
  3. Gain experience writing and debugging C functions

Getting Started with Arrays and Pointers

Before we dive into strings in C, it may help to review the C Reference on Pointers. A pointer, such as int *y would contain the address (in memory) of an int; that is, y would “point” to an integer. We will also be using arrays in this lab; more information on arrays can be found in the C Reference on Arrays. Arrays are stored sequentially in memory; the array variable is essentially a pointer to the first element (lowest address) in the array.

For a quick overview, we can define an array of 10 integers as:

int myarr[10];

We can access the array in two different ways: treating myarr as an array and using square brackets to index into the array (like Java), or treating myarr as a pointer to an integer in memory and using pointer arithmetic to get to the next integer in memory (i.e., in the array).

myarr[4] = 42; // set the 5th element (position 4) to 42
int *ptr_to_five = myarr + 5; // treat myarr as a pointer to the first element
                              // do pointer arithmetic to skip 5 integers down in memory
                              // ptr_to_five has address of 6th element
                              // that is, it points to where myarr[5] is in memory
*ptr_to_five = 56; // set the 6th element (position 5) to 56

The * operator will dereference the pointer (i.e., go to memory at that address). Note that pointer arithmetic will take into account the sizeof() the things in the array.

For more details, please review the reference guide.

String Overview

Layout

In C, strings are pointers to arrays of characters, or char *s. They do not include length information; instead, they use the special value 0 to note the end of the string. Thus, if memory contains

Address: 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F
Char: W e l c o m e \0 t o c h a r \0

Then if char *x contains the address 0x20, then it looks like the string "welcome" and if it contains the address 0x28, it looks like the string "to char". Note that there is no magic to starting at the beginning of something; if x contains the address 0x22 then it looks like the strong "lcome" and if it contains the address 0x27 it looks like the empty string "".

Const

String literals are generally placed by the compiler in read-only memory, and have the type const char * not just char *. Thus,

const char *c = "hello";
c[1] = 'u'; /* error: cannot modify read-only memory */
c = c + 2;  /* OK: c can change but, the memory it points to cannot */

If you want to get a mutable copy of a const string, you can use strdup. From man strdup we find

#include <string.h>
char *strdup(const char *s);

The  strdup()  function  returns  a pointer to a new
string which is a duplicate of the string s.  Memory
for the new string is obtained with malloc, and can be
freed with free.

This means you’d use is as

const char *s1 = "hello"; /* initialized in read-only memory */
char *s2 = strdup(s1);    /* make a copy in mutable memory   */
s2[1] = 'u';              /* change one letter of the copy   */
puts(s2);                 /* display the altered copy        */
free(s2);                 /* don't litter: discard the copy  */

Note that the free when you are done is somewhat important, but won’t break your code if you leave it off in this lab. We’ll discuss more about free in class later this week.

Display

The simplest way to display a string in C is using the function puts. An extract from the manual page man puts says

#include <stdio.h>
int puts(const char *s);

puts() writes the string s and a trailing newline to stdout.
puts() returns a nonnegative number on success, or EOF on error.

Thus

const char *s = "welcome\0to char";
puts(s);
puts(s+2);
puts(s+7);
puts(s+8);

will display

welcome
lcome

to char

Writing string code

Let’s start with some basic string functions. These are in the standard C library, but are worth implementing by hand to better understand them.

String length

The strlen function is described in man strlen as

size_t strlen(const char *s);

The strlen() function calculates the length of the string s, excluding the terminating null byte (‘\0’).

Write an implementation of this function, naming it mystrlen instead of strlen.

Example: The following code

const char *s = "even elephants exfoliate";
size_t slen = mystrlen(s);
puts(s + (slen/2));

should display

ts exfoliate

After you’ve written an implementation, compare with another student. Work together to make a simpler version (you’ll need to explain what you did to a TA later, so remember your changes). When submitting your code, you must use the following format for the header:

size_t mystrlen(const char *s) {

You can put the open bracket { either on the same line or the next.

Simplified strtok

Two library functions, strsep and strtok, both implement a string-splitting behavior. You will implement a simplified version

char *simple_split(char *s, char delim);

Given a string and a delimiter character, do the following:

  • if s is NULL or s[0] is \0, return NULL
  • find the first delim character in s
    • if there is none, return NULL
    • otherwise, replace it with \0 and return a pointer to the character after it

Note: to use NULL you need to include the header stdio.h.

Example: The following code

char *s = strdup("can all aardvarks quaff?");
char *bit = simple_split(s, 'a');
puts(s);
puts(bit);
free(s);

should display

c
n all aardvarks quaff?

Example: The following code

char *trash, *bit, *s;
trash = bit = s = strdup("can all aardvarks quaff?");
do {
    s = bit;
    bit = simple_split(s, 'a');
    puts(s);
} while(bit);
free(trash);

should display

c
n 
ll 

rdv
rks qu
ff?

After you’ve written an implementation, compare with another student. Work together to make a simpler version (you’ll need to explain what you did to a TA later, so remember your changes). When submitting your code, you must use the following format for the header:

char *simple_split(char *s, char delim) {

You can put the open bracket { either on the same line or the next.

Hints

You may run into segmentation faults; that is, you’re trying to read or write memory that you’re not allowed to. This could be that you’ve gone past the end of the array or indexed using an address that you weren’t expecting. It’s strongly encouraged to examine what’s going on using the debugger.

Compile your code with the -g option of clang to add debugging information (including your C code!), so that when you use lldb a.out, you will be stepping through your C code. Set a breakpoint at main and run just as you did in Lab 7 and the escape room. However, you may find these lldb commands helpful:

  • n - next line of code
  • s - step (into) next line (steps into function calls)
  • v var1 var2 - prints the type and value of variables var1 and var2

Check off

Upload the following to gradescope:

  • Your C code charstar.c. Note that the autograder will override your main function for testing. Make sure your main function is at the bottom of the file. You should be able to write your own main function to test your mystrlen and simple_split separately.

To check-off this lab, show a TA

  • Once you submitted your files check in with a TA so that they can give you attendence credit.
  • Describe how you made your implementation smaller (or why you beleive they are as small as they can be).

Copyright © 2023 Daniel Graham, John Hott and Luther Tychonievich.
Released under the CC-BY-NC-SA 4.0 license.
Creative Commons License