Strings (CSCI-UA.0002-008

Indexing

What is the index of the following characters in these strings? →

'h' in "hi bob!"
'!' in "hi bob!"
' ' in "hi bob!"

1. 0
2. 6
3. 2

Indexing Continued

What's the output of the following code? →

idx = -2
animal = "tiger"
print(animal[-1])
print(animal[idx])
print("animal"[2])
print(animal[idx + 2])
print(animal[idx + 3])
print(animal[idx + 100])

r
e
i
t
i
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: string index out of range

Summary for Indexing

a string is an ordered sequence of characters
an index is the numeric position of an element within a collection of elements
the first element has index 0, and it increments from there
in the case of strings, index is the position of the char within a string
you can retrieve a specific character from a string by indexing into it
if index doesn't exist, you will get an error

Indexing Syntax

The indexing syntax is as follows:

value (as variable or literal or even function call)
open square bracket
index of desired char (an integer)
close square bracket

Indexing Examples

# index into string that's bound to a variable name
a = "foo"
a[0]
# index into a string literal
"foo"[0]
# index into the return result of a function
def give_back_foo():
	return "foo"

give_back_foo()[0]

String Operations

+ - concatenation
* - multiplicaiton
% - string formatting

We know the first two.

String formatting uses placeholders (we only use %s for now) in a template string, the percent sign and strings within parentheses to create a new string.

name, pet = 'Bill', 'giraffe'
s = 'Hi %s.  How is your %s?' % (name, pet)
print(s)

Are Strings Mutable?

What happens if I try to change the string "pugs" into "hugs" using inexing and the assignment operator? →

"hugs"[0] = 'p'

strings are not mutable
you can read values by indexing into a string
however, you can't change characters in a string

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' object does not support item assignment

Iterating Over Characters in a String

Here's a program that prints out every letter in "jalopy" with 'eer' appended to it.

word = "jalopy"
for c in word:
	print(c + 'eer')

Looping Over Strings Summary / Syntax

for loops allow us to iterate over every character in a string.
it's similar to looping over a sequence of #'s
the variable in the loop head represents the current charecter
the loop continues until there are no letters left

Letter in Word Example

Using loops, we can implement a function that determines whether a letter is in a word (there's actually already a construct in Python that does this… ).

Letter in Word (v1)

we'll use assert to test a True case and a False case
this particular implementation uses two return statements

def letter_in_word(letter, word):
	""" determines whether or not a letter is in a word"""
	for c in word:
		if c == letter:
			return True
	return False

assert True == letter_in_word('x', "ox"), "letter is in word"
assert False == letter_in_word('y', "ox"), "letter is not in word"

The break Statement

The break statement immediately stops the execution of a loop. What does the following code print out? →

for letter in "challenge":
	if letter == 'a':
		break
	print(letter + "at")

cat
hat

Letter in Word (v2)

Here's another way of implementing letter_in_word using break.

def letter_in_word(letter, word):
	result = False
	for c in word:
		if c == letter:
			result = True
			break
	return result

assert True == letter_in_word('c', "chihuahua"), "letter is in word"
assert False == letter_in_word('x', "chihuahua"), "letter is not in word"

Slicing

Python allows you to carve out a smaller string (a substring) from another string by using slicing.

example: "sand" is a substring in the string "sandwich"
slicing syntax:

some_long_string[m:n]

m is the start index and n is the end index.
the resulting substring starts at m, and goes up to, but does not include n

Substring Exercises

Write the slice to pick out the following substring from the original string:

s = "gone!"

go
on
one
one!

s[0:2]
s[1:3]
s[1:4]
s[1:5]

Some Slicing Tricks

leaving out the first index (before the colon - m) starts at the beginning of the string
leaving out the second index (after the colon - n), ends at the end of the string
if the second index, n is greater than the length of the string, up to the end is sliced

"eggs and ham"[:4] #eggs
"eggs and ham"[9:] #ham
"eggs and ham"[9:100] #ham

An Easier Way to Tell if a Letter is in a Word

Use the in operator!

in tests for membership
it takes to operands, one on each side
if the operand on the left is in the operand on the right, in returns True
in the case of strings, in tests if one string is a substring of the other

In Examples

>>> word = "ice cream"
>>> letter = "a"
>>> letter in word
True
>>> "cat" in "vacation"
True
>>> "cat" in "work"
False
>>>

In / Not In

Some other things to note:

A string is always a substring of itself.

>>> "vacation" in "vacation"
True

Empty string is always a substring of any other string.

>>> "" in "vacation"
True

not in is the logical opposite of the in operator

>>> "cat" not in "vacation"
False

Strings Methods!

Strings are objects. They have methods. Lots of 'em!

upper()
lower()
capitalize()
title()
isdigit()
isnumeric()
isalpha()
isspace()

to be continued in next slide! →

Even More String Methods!

find(sub[, start[, end]])
format(…)
strip([chars])
isupper()
islower()
count(…)
replace(…)
split(…)
join(…)

In the interactive shell, you could use the dir with a string in parentheses to show all of the methods of an object:

dir("some string")

Casing Methods

upper(), lower(), capitilize(), and title() return the string that the method was called on as all uppercase, all lowercase, first letter uppercase, and title-cased (first letter of every word uppercase). What would the following print out? →

print("this should be uppercase".upper())
print("THIS SHOULD BE LOWERCASE".lower())
print("this should be uppercase".capitalize())
print("this should be uppercase".title())

THIS SHOULD BE UPPERCASE
this should be lowercase
This should be uppercase
This Should Be Uppercase

isdigit(), isnumeric() and isalpha()

isdigit(), isnumeric() and isalpha() test whether a string is only composed of all numbers or all letters (all three return False if empty string). What would the following print out? →

* isnumeric() also returns true for numeric characters other than 0-9, such as '⅕'.

print("123".isdigit())            # True
print("1.23".isdigit())           # False (. is not 0 - 9)
print("one two three".isdigit())  # False (not 0 - 9)
print("onetwothree".isalpha())    # True
print("one two three".isalpha())  # False (has spaces)
print("one!".isalpha())           # False (has !)
print("1".isalpha())              # False (it's a digit)
print("⅕".isdigit())              # False (not 0 - 9)
print("⅕".isnumeric())  # True (isnumeric allows other numeric chars)

isspace()

isspace() gives back true if all of the characters in the string it's called on is white space - any kind of white space. What is the output of the following? →

print("             ".isspace())
print("\n".isspace())
print("some    space".isspace())

True
True
False

find()

find() returns the first index where the argument (a character or substring) is found. It returns -1 if the substring is not in the original string.

print("hello".find("h"))
print("hello".find("x"))
print("hello".find("lo"))

0
-1
3

strip()

strip() removes leading and trailing whitespace (it can also remove other leading and trailing characters). What do you think this results

print("  spaces all around   ".strip())

spaces all around

format()

Format is like the string formatting operator, but possibly easier?!

instead of %s, use curly braces and the number of the argument as a placeholder {0}, {1}
numer corrseponds to each argument in the call, 0 being the first
What do you think the following returns? →

"{0} elephants".format("twenty")
"{0} elephants".format(20)
"{0} elephants".format(20, 100)
"{1} elephants".format(20, 100)
"{0} elephants and {1} peanuts".format(20, 100)

format() results

twenty elephants
20 elephants
20 elephants
100 elephants
20 elephants and 100 peanuts

isupper() and islower()

isupper() and islower() return True if the string that is called on is the case specified. What does the following output? →

print("this should be uppercase".isupper())
print("THIS SHOULD BE LOWERCASE".isupper())

False
True

count(), replace()

count(s) …counts the number of times substring, s, occurs in the original string.

'aardvark'.count('a') # --> 3

replace(s, new_s) …replaces all occurrences of substring, s, with new_s. (note that this gives back a new string, and it does not change the original)

'aardvark'.replace('a', '42') # --> 4242rdv42rk

Split and Join

split turns a string into a list based on a separator string
join turns a list into a string based on the string it's called on
what would the following code output? →

words = "fooXXbarXXbazXXqux"
words_list = words.split('XX')
words_again = '~~~'.join(words_list) # notice this is called on a string!
print(words_list)
print(words_again)

['foo', 'bar', 'baz', 'qux']
foo~~~bar~~~baz~~~qux

The Built-In len() Function

len is a built-in function that Returns the length of a sequence

it is not a method, so you do not call it on an object
however, you can pass a value to it, and it will return its length
for strings, it will return the number of characters
last index is the len(s) - 1

print(len("cat"))
# gives 3

Example Question #1

Let's write a function that counts how many letters are in a word. →

use assert on at least two cases to test
it should take two arguments: a letter and a word
it should return an integer

def count_letters(letter, word):
	"""returns the number of times a letter occurs in a word"""
	count = 0
	for c in word:
		if c == letter:
			count += 1
	return count
assert 3 == count_letters("a", "aardvark"), "should count letters in word"
assert 0 == count_letters("x", "aardvark"), "zero if no letters in word"

Example Question #2

Let's write a function that takes a string and puts a space after every character. →

use assert on at least two cases to test (a good one might be empty string)
it should take two one argument: a string
it should return a string
write at least one assertion

def insert_spaces(word):
    """inserts spaces after every letter in word"""
    new_word = ""
    for c in word:
        new_word += c + " " 
    return new_word
assert "f i s h " == insert_spaces("fish"), "inserts spaces after every letter"
assert "" == insert_spaces(""), "empty string if word is empty"
print(insert_spaces("fish"))

Example Question #3

Create a function that returns the first word in a sentence.

assume that spaces separate words
it should take one argument, a string
it should return a string
if the original string is only one word, return that word
if the original string is empty, return an empty string
if the original string is only white space, return an empty string
hint: substring may be helpful
hint: how do you find the end index of substring?

Example Question #3 Potential Solution

# version 1:
def get_first_word(sentence):
	"""returns the first word in a sentence"""
	index = 0
	for c in sentence:
		if c == " ":
			break
		index += 1
	return sentence[0:index]

assert "hi" == get_first_word("hi there!"), "returns first word"
assert "hi" == get_first_word("hi"), "returns word if only one word"
assert "" == get_first_word("  "), "returns empty if only white space"
assert "" == get_first_word(""), "returns empty if sentence is empty"
# print(get_first_word("hi there!"))
# print(get_first_word("hi"))

Example Question #3 Potential Solution

(Again, split is not required for the exam, but feel free to use this if you feel like it's applicable).

# version 2 with split:
def get_first_word(s):
	if len(s) == 0:
		return ""
	else:
		return s.split()[0]

Example Question #4

Write four test cases for get_first_word using: →

0 words
1 word
> 1 words
only spaces

assert "foo" == get_first_word("foo bar baz"), "returns first word in space separated string"
assert "" == get_first_word(""), "returns first word in space separated string"
# etc...

Example Question #5

Create a function called is_digit that determines whether or not a string only has numbers (0-9) in it.

use assert to test a string composed of
- only digits
- digits and other characters
- only non numeric characters
- empty string
it should take one argument, a string
it should return a True only if the only characters in it are 0 through 9
if the original string is empty, return False
hint: not in may be useful

Example Question #5 Potential Solution

(General strategy is iterating over a sequence of characters, and changing a value outside of the loop based on characters)

def is_digit(s):
	"""determines if a string is numeric (only contains 0 through 9)"""
	if s == "":
		return False

	for c in s:
		if c not in "0123456789":
			return False

	return True

assert True == is_digit("58723"), "true of all characters are numeric "
assert False == is_digit("twelve"), "false if not all characters are numeric"
assert False == is_digit("12 ducks"), "false if not all characters are numeric"
assert False == is_digit(""), "false if empty string"
# print(is_digit("43"))

Some More Potential Questions

use upper or lower to check for permutations for input
- for example, loop forever
- ask the user if they want the loop to stop
- accept "Yes", "YES", "yes", etc.
- use upper or lower to normalize
rewrite get_first_word, but use find() instead of a loop
reverse a string