Learning Paths
Advanced
Part 4. Merging and appending branches

Part 4. Merging and appending branches


Step by Step Guide: How to Apply Merge & Append

Both Steps are used for combine datasets: With the Append Step, you will combine vertically both datasets. On the other hand, Merge allows you to combine datasets side by side by matching entity values on both datasets.

01. Clipping Datasets Go to the repository Sales Dataset Example. Then, clipped the three datasets to use them later:
clipped .png

02. Create a Pipeline As we have seen, go to the right corner of your screen Create New then click on Pipeline
create a pipeline 1.jpg

We will begin by loading Jane's sales data. Let's select Fetch Alphacast Dataset and drag and drop our clipped dataset SalesData from the sidebar.
Jones Sales 1.png

03. Append Step
image.png

We have a number of options here. First, we need to choose which data we are going to append Jhon Doe information to our data. We have two options:

3.1 Load a dataset
append 1 o 2.png

3.2 Fetch a dataset
We have the option of transforming it before appending it. If you like, you can add more steps to be executed between the fetching of the data and the append step.

append 1 o 2.png

append 2p.png

04. Append Editor

We need to define how will the "duplicates" be handled. What will happen if there are rows in both datasets that share the same entities? You have to decide whether you will keep the data from the original dataset, from the new one, or drop both.

last append p.png

Notice that we now have John and Jane on a single dataset.

05. Merge Step

Now we are going to combine this data with the third dataset, with information on the number of items sold by both of them.Again, we need to choose the dataset. This time we will drag and drop "Items Sold".

merge 1p.png

06. Merge Editor

6.1 Criteria to combine the rows

How can we know which rows from the first dataset correspond to each row from the second one.

Year and Sales Person are the two entities of our data. In this example, we just need to define that year match with year, and that sales persons match with sales person.

merg edit 1p.jpg

6.2 Find a match between both datasets

We have to decide what will happen when not all rows and columns in both datasets find a match in the other one. There are four types of criteria for joining:

image.png

  • Inner join: The new dataset will have only those rows that can be matched.
  • Left join: All the rows of dataset A will be present and the unmatched rows of dataset B are discarded.
  • Right join: Reverse to the previous one. All those from dataset B and discarded the unmatched ones from A.
  • Outer join: The data from both datasets will remain even if they do not match.

last merge1.jpg

In this example, all the rows match, so the result will look alike with any option. For more information on Merging Datasets click here (opens in a new tab)