ForEach Loops in Azure Data Factory
In the previous post, we looked at how to use variables in pipelines. We took a sneak peek at working with an array, but we didn’t actually do anything with it. But now, we will! In this post, we will look at how to use arrays to control foreach loops.
ForEach Loops
You can use foreach loops to execute the same set of activities or pipelines multiple times, with different values each time. A foreach loop iterates over a collection. That collection can be either an array or a more complex object. Inside the loop, you can reference the current value using @item().
Let’s take a look at how this works in Azure Data Factory!
Creating ForEach Loops
In the previous post about variables, we created a pipeline that set an array variable called Files. Let’s use this array in a slightly more useful way 😊 Delete the old Set List of Files activity and ListOfFiles variable:
Add a foreach loop instead:
In the foreach loop settings, you can set the sequential, batch count, and items properties:
By default, the foreach loop tries to run as many iterations as possible in parallel. You can choose to run them sequentially instead, for example if you need to copy data into a single table and want to ensure that each copy finishes before the next one starts.
If you choose to run iterations in parallel, you can limit the number of parallel executions by setting the batch count. The default number is 20 and the max number is 50.
Finally, you have to choose the items to loop over. Click to open the add dynamic content pane, and choose the Files array variable:
Then, go to the activities settings, and click add activity:
Inside the foreach loop, add an execute pipeline activity, and choose the parameterized Lego_HTTP_to_ADLS pipeline:
Now we need to pass the current value from the Files array as the FileName pipeline parameter:
Unfortunately, the add dynamic content pane does not have a shortcut for referencing the current value inside a foreach loop 😕
But! Like I mentioned earlier, you can use @item(). You just have to type it in yourself:
Debugging ForEach Loops
Now, our pipeline will set the Files array, then use the array to control the foreach loop. For each iteration of the loop, the filename will be passed as a parameter to the parameterized pipeline. Click debug:
Set the LoadAllFiles parameter to true:
When we debug a foreach loop, we will get a warning saying that all activities will be executed sequentially. But don’t worry! It will run in parallel when you trigger it:
In the output, we will see that the foreach loop ran the execute pipeline activity nine times:
Click on the forach loop input to view the item count:
Click on an activity input to view the parameter used for that specific activity:
Tadaaa! 🥳
ForEach Loops using Array Items
In this post, we looked at foreach loops that iterates over arrays. In JSON, an array can look something like this:
["themes", "sets", "parts"]
If we use the example code above, we can illustrate how foreach loops work like this:
Summary
In this post, we looked at how to use arrays to control foreach loops. We saw how we can use @item() to reference the current value from the array.
…but I also mentioned that foreach loops can iterate over more complex objects. Something like… the output of lookups, perhaps? 😄 Guess what we will look at in the next post!
About the Author
Cathrine Wilhelmsen is a Microsoft Data Platform MVP, international speaker, author, blogger, organizer, and chronic volunteer. She loves data and coding, as well as teaching and sharing knowledge - oh, and sci-fi, gaming, coffee and chocolate 🤓