Strings as a Compound Data Type - Review (CSCI-UA.0002-008

Indexing

What is the index of the following characters in these strings? →

'h' in "hi bob!"
'!' in "hi bob!"
' ' in "hi bob!"

1. 0

2. 6

3. 2

Indexing Continued

What's the output of the following code? →

idx, animal = -2, "tiger"
print(animal[-1])
print(animal[idx])
print("animal"[2])
print(animal[idx + 2])
print(animal[idx + 100])

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

Summary for Indexing

a string is an ordered sequence of characters
an index is the numeric position of an element within a collection of elements
in the case of strings, index is the position of the char within a string
the first element is at index 0, and it increments from there
last element is at index length - 1 (len(s) - 1)
also, last element is at index -1
second to last element is at index -2
you can retrieve a specific character from a string by indexing into it
if index doesn't exist, you will get an error

Indexing Examples

# index into string that's bound to a variable name
a = "foo"
a[0]
# index into a string literal
"foo"[0]
# index into the return result of a function
def give_back_foo():
	return "foo"

give_back_foo()[0]

Are Strings Mutable?

What happens if I try to change the string "pugs" into "hugs" using inexing and the assignment operator? →

"hugs"[0] = 'p'

strings are not mutable
you can read values by indexing into a string
however, you can't change characters in a string

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

An Index That Doesn't Exist

What happens if I try to access an index that doesn't exist? →

word = "pugs"
print(word[50])

accessing an index that doesn't exist causes an error
specifically, an IndexError

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of ranges

Using Every Character in a String

What if we want to do something to each character in a string?

count the number of exclamation points I used in my slides!?
- by going through every character
- and incremting a count variable
take a word and create a new word by changing all vowels into numbers for a variant of "L33T SP34K"
- construct a new string
- go through every character of the original string
- add a number for cerain characters… or the original character

Looping Over Each Character

We can iterate over every character that's in a string.

we've used a construct that lets us iterate over every element in an ordered sequence
what was that construct?→

# for loops!
for c in "hello":
  print(c)

For Loops and Strings

my_crazy_string = "oh my!"
for c in my_crazy_string:
  print(c)

for loops allow us to iterate over every char in a string.
similar to looping over a sequence of #'s
variable specified in loop represents the current char
the loop continues until there are no letters left
the above code prints out:

o
h
 
m
y
!

Another Example

The following code:

uses a for loop to go over every letter in "jetpack"
for each letter, prints out the letter with a suffix of "am" appended it

word = "jetpack"
suffix = "am"
for c in word:
	print(c + suffix)

A Quick Aside on Assert

assert <some expression that results in a bool>, <a string>

assert statements consist of the keyword assert
followed by any expressions that results in a bool
followed by a string
causes an error if condition is not True
we use asserts to test expected and actual values of functions
goal is to get no output (though you can print something after saying that tests passed)
try writing your tests first!

assert expected == actual, "describe test"

A Quick Aside on Break

The break statement immediately stops the execution of a loop. What does the following code print out? →

for letter in "challenge":
	if letter == 'a':
		break
	print(letter + "at")

cat
hat

A Quick Aside on Docstrings

A docstring is a triple-quoted string immediately following the header of a function definition.

it describes what the function does
it's essentially an inline comment / documentation
some tools pick up on this documentation (like the help function in the interactive shell)

def foo()
	"""gives back a greeting"""
	return "hi"

Letter in Word (v1)

Try implementing this function!

create an ipo chart
write assertion statements (how many?)
write a doc string
implement a function called letter_in_word
it should take two arguments, a letter and a word
it should return True if the letter is in the word, otherwise, return False
use assert on a True case and on a False case
implement using two return statements

Letter in Word (v1) Continued

def letter_in_word(letter, word):
	""" determines whether or not a letter is in a word"""
	for c in word:
		if c == letter:
			return True
	return False

assert True == letter_in_word('x', "ox"), "letter is in word"
assert False == letter_in_word('y', "ox"), "letter is not in word"

Letter in Word (v2)

Here's another way of doing it using break.

def letter_in_word(letter, word):
	result = False
	for c in word:
		if c == letter:
			result = True
			break
	return result

assert True == letter_in_word('c', "chihuahua"), "letter is in word"
assert False == letter_in_word('x', "chihuahua"), "letter is not in word"

Letter in Word Revisited

There are actually a couple of much easier ways to do this rather than writing our own function. What are two ways of determining whether or not a string is a substring of another string? →

# use the in operator:
'a' in 'aardvark'

# use the find method:
'aardvark'.find('a')

How Many A's Are in Aardvark?

Write a function that counts how many times a letter occurs in a word. →

create a function called count_letters
use assert on at least two cases to test, and write a docstring
create an ipo chart
- it should take two arguments: a letter and a word
- it should return an integer

>>> print(count_letters('a', 'aardvark'))
3

How Many A's Are in Aardvark? Solution

def count_letters(letter, word):
	"""returns the number of times a letter occurs in a word"""
	count = 0
	for c in word:
		if c == letter:
			count += 1
	return count
assert 3 == count_letters("a", "aardvark"), "should count letters in word"
assert 0 == count_letters("x", "aardvark"), "zero if no letters in word"

Counting letters Revisited

Just like finding a substring, there's a much easier way to find the number of times a substring occurs inside another string. What is the method that let's us do this? →

# it's just count!
"banana".count('an')

Construct a String With Spaces After Every Letter

Let's write a function that takes a string and puts a space after every character. →

create a function called insert_spaces
use assert on at least two cases to test (a good one might be empty string), write a doc string
create an ipo chart
- it should take two one argument: a string
- it should return a string

>>> print(insert_spaces('aaaah!'))
a a a a h !

Construct a String With Spaces After Every Letter Solution

def insert_spaces(word):
    """inserts spaces after every letter in word"""
    new_word = ""
    for c in word:
        new_word += c + " " 
    return new_word
assert "f i s h " == insert_spaces("fish"), "inserts spaces after every letter"
assert "" == insert_spaces(""), "empty string if word is empty"
print(insert_spaces("fish"))

Slicing

You can also retrieve a substring from another string.

you can get a section of consecutive characters from a string
example: "ana" is a substring in the string "banana"

This is done using slicing. Slicing syntax works as follows:

>>> "placate"[3:6]
'cat'

Slicing Syntax

Looking at the slicing code again:

>>> "placate"[3:6]
'cat'

The general case is:

some_long_string[m:n]

m is the start index and n is the end index.
the resulting substring starts at m, and goes up to, but does not include n

Substring Exercises

Write the slice to pick out the following substring from the original string:

s = "gone!"
#    01234

go
on
one
one!

s[0:2]

s[1:3]

s[1:4]

s[1:5]

Some Slicing Tricks

leaving out the first index (before the colon - m) starts at the beginning of the string
leaving out the second index (after the colon - n), ends at the end of the string
if the second index, n is greater than the length of the string, up to the end is sliced

"eggs and ham"[:4] #eggs
"eggs and ham"[9:] #ham
"eggs and ham"[9:100] #ham

An Easier Way to Tell if a Letter is in a Word

Use the in operator!

in tests for membership
it takes two operands, one on each side
if the operand on the left is in the operand on the right, in returns True
in the case of strings, in tests if one string is a substring of the other

In Examples

>>> word = "ice cream"
>>> letter = "a"
>>> letter in word
True
>>> "cat" in "vacation"
True
>>> "cat" in "work"
False
>>>

In / Not In

Some other things to note:

A string is always a substring of itself.

>>> "vacation" in "vacation"
True

Empty string is always a substring of any other string.

>>> "" in "vacation"
True

not in is the logical opposite of the in operator

>>> "cat" not in "vacation"
False

First Word in a Sentence

Create a function that returns the first word in a sentence.

assume that spaces separate words
use assert to test with a sentence
- 0 words
- 1 word
- > 1 words
- only spaces
it should take one argument, a string
it should return a string
if the original string is only one word, return that word
if the original string is empty, return an empty string
if the original string is only white space, return an empty string
hint: substring may be helpful
hint: how do you find the end index of substring?

First Word in a Sentence Potential Solution

def get_first_word(sentence):
	"""returns the first word in a sentence"""
	index = 0
	for c in sentence:
		if c == " ":
			break
		index += 1
	return sentence[0:index]

assert "hi" == get_first_word("hi there!"), "returns first word"
assert "hi" == get_first_word("hi"), "returns word if only one word"
assert "" == get_first_word("  "), "returns empty if only white space"
assert "" == get_first_word(""), "returns empty if sentence is empty"
# print(get_first_word("hi there!"))
# print(get_first_word("hi"))

Is Digit?

Create a function that determines whether or not a string only has numbers (0-9) in it.

use assert to test a string composed of
- only digits
- digits and other characters
- only non numeric characters
- empty string
it should take one argument, a string
it should return a True only if the only characters in it are 0 through 9
if the original string is empty, return False
hint: not in may be useful

Is Digit? Potential Solution

def is_digit(s):
	"""determines if a string is numeric (only contains 0 through 9)"""
	if s == "":
		return False

	for c in s:
		if c not in "0123456789":
			return False

	return True

assert True == is_digit("58723"), "true of all characters are numeric "
assert False == is_digit("twelve"), "false if not all characters are numeric"
assert False == is_digit("12 ducks"), "false if not all characters are numeric"
assert False == is_digit(""), "false if empty string"
# print(is_digit("43"))

Is Digit Revisited

And of course, there's already a couple of methods that do this. What are they? →

"123".isdigit()
"123".isnumeric()
"abc".isdigit()
"abc".isnumeric()

Strings as a Compound Data Type - Review