Get the Next Row Item Continue and Skip Next Item Python Pandas

July 16, 2019

Tutorial: Advanced For Loops in Python

In a previous tutorial, we covered the basics of Python for loops, looking at how to iterate through lists and lists of lists. But there's a lot more to for loops than looping through lists, and in real-world data science work, you may want to use for loops with other data structures, including numpy arrays and pandas DataFrames.

This tutorial begins with how to use for loops to iterate through common Python data structures other than lists (like tuples and dictionaries). Then we'll dig into using for loops in tandem with common Python data science libraries like numpy, pandas, and matplotlib. We'll also take a closer look at the range() function and how it's useful when writing for loops.

A Quick Review: The Python For Loop

A for loop is a programming statement that tells Python to iterate over a collection of objects, performing the same operation on each object in sequence. The basic syntax is:

            for object in collection_of_objects:     # code you want to execute on each object                      

Each time Python iterates through the loop, the variable object takes on the value of the next object in our sequence collection_of_objects, and Python will execute the code we have written on each object from collection_of_objects in sequence.

Now, let's dive into how to use for loops with different sorts of data structures. We'll skip lists since those have been covered in the previous tutorial; if you need further review, check out the introductory tutorial or Dataquest's interactive lesson on lists and for loops.

Data Structures

Tuples

Tuples are sequences, just like lists. The difference between tuples and lists is that tuples are immutable; that is, they cannot be changed (learn more about mutable and immutable objects in Python). Tuples also use parentheses instead of square brackets.

Regardless of these differences, looping over tuples is very similar to lists.

            x = (10,20,30,40,50)              for              var              in              x:     print("index "+ str(x.index(var)) +              ":",var)                      
            index 0: 10 index 1: 20 index 2: 30 index 3: 40 index 4: 50                      

If we have a list of tuples, we can access the individual elements in each tuple in our list by including them both as variables in the for loop, like so:

            x = [(1,2), (3,4), (5,6)]              for              a, b              in              x:     print(a,              "plus", b,              "equals", a+b)                      
            1 plus 2 equals 3 3 plus 4 equals 7 5 plus 6 equals 11                      

Dictionaries

In addition to lists and tuples, dictionaries are another common Python data type you're likely to encounter when working with data, and for loops can iterate through dictionaries, too.

Python dictionaries are composed of key-value pairs, so in each loop, there are two elements we need to access (the key and the value). Instead of using enumerate() like we would with lists, to loop over both keys and the corresponding values for each key-value pair we need to call the .items() method.

For example, imagine we have a dictionary called stocks that contains both stock tickers and the corresponding stock prices. We'll use the .items() method on our dictionary to generate a key and value for each iteration:

            stocks = {              'AAPL':              187.31,              'MSFT':              124.06,              'FB':              183.50              }              for              key, value              in              stocks.items() :     print(key +              " : "              + str(value))                      
            AAPL : 187.31 MSFT : 124.06 FB : 183.5                      

Note that the names key and value are completely arbitrary; we could also label these as k and v or x and y.

Strings

As mentioned in the introductory tutorial, for loops can also iterate through each character in a string. As a quick review, here's how that works:

            print("data science")              for              c              in              "data science":     print(c)                      
            data science d a t a  s c i e n c e                      

Numpy Arrays

Now, let's take a look at how for loops can be used with common Python data science packages and their data types.

We'll start by looking at how to use for loops with numpy arrays, so let's start by creating some arrays of random numbers.

                          import              numpy              as              np np.random.seed(0)              # seed for reproducibility              x = np.random.randint(10, size=6) y = np.random.randint(10, size=6)                      

Iterating over a one-dimensional numpy array is very similar to iterating over a list:

                          for              val              in              x:     print(val)                      
            5 0 3 3 7 9                      

Now, what if we want to iterate through a two-dimensional array? If we use the same syntax to iterate a two-dimensional array as we did above, we can only iterate entire arrays on each iteration.

                          # creating our 2-dimensional array              z = np.array([x, y])              for              val              in              z:     print(val)                      
            [5 0 3 3 7 9] [3 5 2 4 7 6]                      

A two-dimensional array is built up from a pair of one-dimensional arrays. To visit every element rather than every array, we can use the numpy function nditer(), a multi-dimensional iterator object which takes an array as its argument.

In the code below, we'll write a for loop that iterates through each element by passing z, our two-dimensional array, as the argument for nditer():

                          for              val              in              np.nditer(z):     print(val)                      
            5 0 3 3 7 9 3 5 2 4 7 6                      

As we can see, this first lists all of the elements in x, then all elements of y.

Remember! When looping through these different data structures, dictionaries require a method, numpy arrays require a function.

Pandas DataFrames

When we're working with data in Python, we're often using pandas DataFrames. And thankfully, we can use for loops to iterate through those, too.

Let's practice doing this while working with a small CSV file that records the GDP, capital city, and population for six different countries. We will read this into a pandas DataFrame below.

Pandas works a bit differently from numpy, so we won't be able to simply repeat the numpy process we've already learned. If we try to iterate over a pandas DataFrame as we would a numpy array, this would just print out the column names:

                          import              pandas              as              pd df = pd.read_csv('gdp.csv', index_col=0)              for              val              in              df:     print(val)                      
            Capital GDP ($US Trillion) Population                      

Instead, we need to mention explicitly that we want to iterate over the rows of the DataFrame. We do this by calling the iterrows() method on the DataFrame, and print row labels and row data, where a row is the entire pandas series.

                          for              label, row              in              df.iterrows():     print(label)     print(row)                      
            Ireland Capital                Dublin GDP ($US Trillion)     0.3337 Population            4784000 Name: Ireland, dtype: object United Kingdom Capital                 London GDP ($US Trillion)       2.622 Population            66040000 Name: United Kingdom, dtype: object United States Capital               Washington, D.C. GDP ($US Trillion)               19.39 Population                   327200000 Name: United States, dtype: object China Capital                  Beijing GDP ($US Trillion)         12.24 Population            1386000000 Name: China, dtype: object India Capital                New Delhi GDP ($US Trillion)         2.597 Population            1339000000 Name: India, dtype: object Germany Capital                 Berlin GDP ($US Trillion)       3.677 Population            82790000 Name: Germany, dtype: object                      

We can also access specific values from a pandas series. Suppose we just want to print out the capital of each country. We can specify that we want only output from the "Capital" column like so:

                          for              label, row              in              df.iterrows():     print(label +              " : "              + row["Capital"])                      
            Ireland : Dublin United Kingdom : London United States : Washington, D.C. China : Beijing India : New Delhi Germany : Berlin                      

To take things further than simple printouts, let's add a column using a for loop. Let's add a GDP per capita column. Remember that .loc[] is label-based. In the code below, we'll add the column and compute its contents for each country by dividing its total GDP from its population and multiplying the result by one trillion (since the GDP numbers are listed in trillions).

                          for              label, row              in              df.iterrows():     df.loc[label,'gdp_per_cap'] = row['GDP ($US Trillion)']/row['Population '] *              1000000000000              print(df)                      
                          Capital  GDP ($US Trillion)  Population   \ Country                                                              Ireland                   Dublin              0.3337      4784000    United Kingdom            London              2.6220     66040000    United States   Washington, D.C.             19.3900    327200000    China                    Beijing             12.2400   1386000000    India                  New Delhi              2.5970   1339000000    Germany                   Berlin              3.6770     82790000                      gdp_per_cap   Country                        Ireland         69753.344482   United Kingdom  39703.210176   United States   59260.391198   China            8831.168831   India            1939.507095   Germany         44413.576519                      

For each row in our dataframe, we are creating a new label, and setting the row data equal to the total GDP divided by the country's population, and multiplying by $1T for thousands of dollars.

The range() function

We have seen how we can use for loops to iterate over any sequence or data structure. But what if we would like to iterate over these sequences in a specific order, or for a specific number of times?

This can be accomplished with Python's built-in range() function. Depending on how many arguments you pass to the function, you can decide where that series of numbers will begin and end as well as how big the difference will be between one number and the next. Note that, similar to lists, the range() function's count starts from 0 and not from 1.

There are three ways we can call range():

  1. range(stop)
  2. range(start, stop)
  3. range(start, stop, step)

range(stop)

range(stop) takes one argument, used when we want to iterate over a series of numbers thats starts at 0 and includes every number up to, but not including, the number we set as the stop.

                          for              i              in              range(3):     print(i)                      
            0 1 2                      

range(start, stop)

range(start, stop) takes two arguments, where we can not only set the end of the series but also the beginning. You can use range() to generate a series of numbers from A to B using a range(A, B).

                          for              i              in              range(1,              8):     print(i)                      
            1 2 3 4 5 6 7                      

range(start, stop, step)

range(start, stop, step) takes three arguments. In addition to the minimum and maximum values, we can set the difference between one number in the sequence and the next. The default step value is 1 if none is provided.

                          for              i              in              range(3,              16,              3):     print(i)                      
            3 6 9 12 15                      

Note that this works the same for non-numerical sequences.

We can also use the index of elements in a sequence to iterate. The key idea is to first calculate the length of the list and then iterate over the sequence within the range of this length. Let's take a look at an example:

            languages = ['Spanish',              'English',              'French',              'German',              'Irish',              'Chinese']              for              index              in              range(len(languages)):    print('Language:', languages[index])                      
            Language: Spanish Language: English Language: French Language: German Language: Irish Language: Chinese                      

In our for loop above we are looking at a variable's index and language, the in keyword, and the range() function to create a sequence of numbers. Note that we also use the len() function in this case, as the list is not numerical.

For each iteration, we are executing our print statement. So for every index in the range len(languages), we want to print a language. Because the length of our languages sequence is 6 (that is the value that len(langauges) evaluates to), we can rewrite the statement as follows:

                          for              index              in              range(6):     print('Language:', languages[index])                      
            Language: Spanish Language: English Language: French Language: German Language: Irish Language: Chinese                      

Plotting with For Loops

Suppose we want to iterate through a collection, and use each element to produce a subplot, or even for each trace in a single plot. For example, let's take the popular iris data set (learn more about this data) and do some plotting with for loops. Consider the graph below.

(If you are unfamiliar with Matplotlib or Seaborn, check out these beginner guides fro Kyso: Matplotlib, Seaborn. Dataquest also offers interactive courses on Python data visualization).

                          import              pandas              as              pd              import              matplotlib.pyplot              as              plt  df = pd.read_csv('iris.csv')                      
                          # create a figure and axis              fig, ax = plt.subplots()              # scatter the sepal_length against the sepal_width              ax.scatter(df['sepal_length'], df['sepal_width'])              # set a title and labels              ax.set_title('Iris Dataset') ax.set_xlabel('sepal_length') ax.set_ylabel('sepal_width')                      
            Text(0,0.5,'sepal_width')                      

for-loops-python-visualization

Above, we've plotted each sepal length vs sepal width, but we can give the graph more meaning by coloring in each data point by each flower's species class. One way to do this is by scattering each point on its own using a for loop and passing in the respective color.

                          # create color dictionary              colors = {'setosa':'r',              'versicolor':'g',              'virginica':'b'}              # create a figure and axis              fig, ax = plt.subplots()              # plot each data-point              for              i              in              range(len(df['sepal_length'])):     ax.scatter(df['sepal_length'][i], df['sepal_width'][i], color=colors[df['species'][i]])              # set a title and labels              ax.set_title('Iris Dataset') ax.set_xlabel('sepal_length') ax.set_ylabel('sepal_width')                      
            Text(0,0.5,'sepal_width')                      

python-for-loop

What if we want to visualize the univariate distribution of certain features of our iris dataset? We can do this with plt.subplot(), which creates a single subplot within a grid, the numbers of columns and rows of which we can set.

            fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(8,              6)) fig.subplots_adjust(hspace=0.8) fig.suptitle('Distributions of Iris Features')              for              ax, feature, name              in              zip(axes.flatten(), df.drop('species',axis=1).values.T, df.columns.values):     ax.hist(feature, bins=len(np.unique(df.drop('species',axis=1).values.T[0])//2))     ax.set(title=name[:-4].upper(), xlabel='cm')                      

Without diving too deep into the matplotlib syntax for now, below is a brief description of each main component of our graph:

  • plt.subplot( ) – used to create our 2-by-2 grid and set the overall size.
  • zip( ) – this is a built-in python function that makes it super simple to loop through multiple iterables of the same length in simultaneously.
  • axes.flatten( ), where flatten( ) is a numpy array method – this returns a flattened version of our arrays (columns).
  • ax.set( ) – allows us to set all of the attributes of our axes object with a single method.

Additional Operations

Nested Loops

Python allows us to use one loop inside another loop. This involves an outer loop that has, inside its commands, an inner loop.

Consider the following structure:

            for iterator_var in sequence:      for iterator_var in sequence:          # statements(s)          # statements(s)                      

Nested for loops can be useful for iterating through items within lists composed of lists. In a list composed of lists, if we employ just one for loop, the program will output each internal list as an item:

            languages = [['Spanish',              'English',              'French',              'German'], ['Python',              'Java',              'Javascript',              'C++']]              for              lang              in              languages:     print(lang)                      
            ['Spanish', 'English', 'French', 'German'] ['Python', 'Java', 'Javascript', 'C++']                      

In order to access each individual item of the internal lists, we define a nested for loop:

                          for              x              in              languages:     print("------")              for              lang              in              x:         print(lang)                      
            ------ Spanish English French German ------ Python Java Javascript C++                      

Above, the outer for loop is looping through the main list-of-lists (which contains two lists in this example) and the inner for loop is looping through the individual lists themselves. The outer loop executes 2 iterations (for each sub-list) and at each iteration we execute our inner loop, printing all elements of the respective sub-lists.

This tells us that the control travels from the outermost loop, traverses the inner loop and then back again to the outer for loop, continuing until the control has covered the entire range, which is 2 times in this case.

Continuing and Breaking For Loops

Loop control statements change the execution of a for loop from its normal sequence.

What if we want to filter out a specific language within our inner loop? We can use a continue statement to do this, which allows us to skip over a specific part of our loop when an external condition is triggered.

                          for              x              in              languages:     print("------")              for              lang              in              x:              if              lang ==              "German":              continue              print(lang)                      
            ------ Spanish English French ------ Python Java Javascript C++                      

In our loop above, within the inner loop, if the langauge equals "German", we skip that iteration only and continue with the rest of the loop. The loop is not terminated.

Let's look at a numerical example below:

                          from              math              import              sqrt number =              0              for              i              in              range(10):    number = i **              2              if              i %              2              ==              0:              continue              # continue here              print(str(round(sqrt(number))) +              ' squared is equal to '              + str(number))                      
            1 squared is equal to 1 3 squared is equal to 9 5 squared is equal to 25 7 squared is equal to 49 9 squared is equal to 81                      

So here, we have defined a loop that iterates over all numbers 0 through 9, and squares each number. Within our loop, at each iteration, we are checking if the number is divisible by 2, at which point the loop will continue to execute, skipping the iteration when i evaluates to an even number.

What about a break statement? This allows us to exit a loop entirely when an external condition is met. Let's see a simple demonstration of how this works using the same example as above:

            number =              0              for              i              in              range(10):    number = i **              2              if              i ==              7:              break              print(str(round(sqrt(number))) +              ' squared is equal to '              + str(number))                      
            0 squared is equal to 0 1 squared is equal to 1 2 squared is equal to 4 3 squared is equal to 9 4 squared is equal to 16 5 squared is equal to 25 6 squared is equal to 36                      

In the above example, our if statement presents the condition that if our variable i evaluates to 7, our loop will break, so our loop iterates over integers 0 through 6 before dropping out of the loop entirely.

Looking for more? Here are some additional resources that might be useful:

  • Python Tutorials — Our ever-expanding list of Python tutorials for data science.
  • Data Science Courses — Take your studies to the next level with fully interactive programming, data science, and stats courses, right in your browser.

Conclusion

In this tutorial, we learned about some more advanced applications of for loops, and how they might be used in typical Python data science workflows.

We learned how to iterate over different types of data structures, and how loops can be used with pandas DataFrames and matplotlib to create multiple traces or sub-plots programmatically.

Finally, we looked at some more advanced techniques that give us more control over the operation and execution of our for loops.

If you'd like to learn more about this topic, check out Dataquest's Data Scientist in Python path that will help you become job-ready in around 6 months.

Did this tutorial help?

Choose your path to keep learning valuable data skills.

arrow down left

arrow right down

Python Tutorials

Practice your Python programming skills as you work through our free tutorials.

Data science courses

Commit to your study with our interactive, in-your-browser data science courses in Python, R, SQL, and more.

schaefernouse1990.blogspot.com

Source: https://www.dataquest.io/blog/tutorial-advanced-for-loops-python-pandas/

0 Response to "Get the Next Row Item Continue and Skip Next Item Python Pandas"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel