We update our blog with regular posts to keep you up to speed on the world of B2B data.
Posted on 27/01/2020 at 10:49By Corpdata

Pro-tips: Spreadsheets and your data
In this article we look at one issue which our customers often have trouble with. We mentioned the issue of duplication and it's risks in our recent articles 'Demystifying Data Cleansing' and 'Demystifying deduplication and merge/purge'. Having decided upon a suitable order, people often work through data in spreadsheets methodically. Changing the order of the rows can be an inhibitor in cleaning up data. Here we provide simple techniques to help you overcome these challenges. We will provide instructions for two of the most commonly used spreadsheets, Microsoft Excel, we will use 2019 Professional Plus edition, and LibreOffice Calc (5.3.6.12 to be precise). Other spreadsheets have similar methods of working but these instructions may need interpreting. We are working on PCs, but similar instructions will apply to other platforms such as Mac. These instructions are for Microsoft Excel. To see the instructions for Calc, click here.Here is a view of the data we will be working with.
It's a fairly well populated and orderly example data set. Of course, it's fictitious to prevent personal data privacy issues! (Some of you may recognise where the inspiration came from.)
You can download this dataset to work with.Keeping order
Data sometimes contains 'natural' orders, things like postcode, or name. On other occasions, there is no intrinsic ordering key within data. Then we need to create one. There is no 'right place' to create an index key column, though for simplicity, creating a new column A on the left works well. Right-click on the column header for column A, and from the context menu select 'Insert'. This will leave your spreadsheet looking like this:
Adding row IDs
Now we have a column into which we can put a sequential number. This is where we can use the accounting tools within spreadsheets to our advantage, but there are a few 'gotchas' to watch out for. You can include the title row, row 1 in the index numbering or not. I will include it.
Formulae always start with =. We won't go into this here, but there are an awful lot of possibilities when you become familiar with using formulae.
Now we just need to copy the formula down through all the other rows we want populated.


Sorting to highlight issues
Now we have a way to put our data back in the right order, we can start to manipulate it. With our small data set this is easy, we can see all the data on-screen at once. The techniques we are using here come into their own when we have much more data, but this is just a learning example. We will look for duplicates by sorting the data by different columns. BEWARE: GOTHCA ALERTWe need to make sure we select all the data, from the first column to the last column, and from the first row to the last row. If we only select part, we will only sort that part, thus scrambling the data. Remember, UNDO is a very useful function. To undo your last action, press Control + ZLet's see if we have any duplicates by email address.






WHOA! We have a dupe! What's more, this would be tricky to spot another way, we have Gollum and Smeagol, we have different postcodes, we have different addresses too!
Naturally you would now want to do something with the data at this point, but that can be considered homework. We will simply remove the row marked DUPE.
Click the row number 10 marker at the left to highlight the single row with DUPE in the I column. Right click, and choose 'Delete'.
Obviously you can sort by any columns, such as postcode, company name, surname and even combinations of them. Having done any work you need to do, you will want to reset your data to it's original order.Resetting to the original order
You have probably guessed this is just another sort operation, but we will also do some tidying up. First, let's remove column I. Right click in the Column Header I, and select 'Delete' from the context menu. Now let's restore the data to it's original order: Select the entire spreadsheet by clicking the top left blank square, then de-select the title row if you want, though it doesn't matter. Next select 'Data' -> 'Sort' and choose 'Column A' for 'Column' and click OK.
Your data is now back in it's original order. You will notice ID 2 and the row that was alongside it is now missing. That is the row we deleted earlier.
Now to delete the ID column, click the Column A letter marker at the top to highlight the single column with the numbers we created in. Right click, and choose 'Delete'.
Posted on 27/01/2020 at 10:44By Corpdata

Pro-tips: Spreadsheets and your data
In this article we look at one issue which our customers often have trouble with. We mentioned the issue of duplication and it's risks in our recent articles 'Demystifying Data Cleansing' and 'Demystifying deduplication and merge/purge'. Having decided upon a suitable order, people often work through data in spreadsheets methodically. Changing the order of the rows can be an inhibitor in cleaning up data. Here we provide simple techniques to help you overcome these challenges. We will provide instructions for two of the most commonly used spreadsheets, Microsoft Excel, we will use 2019 Professional Plus edition, and LibreOffice Calc (5.3.6.12 to be precise). Other spreadsheets have similar methods of working but these instructions may need interpreting. We are working on PCs, but similar instructions will apply to other platforms such as Mac. These instructions are for LibreOffice Calc. To see the instructions for Excel, click here. Here is a view of the data we will be working with.
It's a fairly well populated and orderly example data set. Of course, it's fictitious to prevent personal data privacy issues! (Some of you may recognise where the inspiration came from.)
You can download this dataset to work with.Keeping order
Data sometimes contains 'natural' orders, things like postcode, or name. On other occasions, there is no intrinsic ordering key within data. Then we need to create one. There is no 'right place' to create an index key column, though for simplicity, creating a new column A on the left works well. Right-click on the column header for column A, and from the context menu select 'Insert Columns Left'. This will leave your spreadsheet looking like this:
Adding row IDs
Now we have a column into which we can put a sequential number. This is where we can use the accounting tools within spreadsheets to our advantage, but there are a few 'gotchas' to watch out for. You can include the title row, row 1 in the index numbering or not. I will include it.
Formulae always start with =. We won't go into this here, but there are an awful lot of possibilities when you become familiar with using formulae.
Now we just need to copy the formula down through all the other rows we want populated.


Sorting to highlight issues
Now we have a way to put our data back in the right order, we can start to manipulate it. With our small data set this is easy, we can see all the data on-screen at once. The techniques we are using here come into their own when we have much more data, but this is just a learning example. We will look for duplicates by sorting the data by different columns. BEWARE: GOTHCA ALERTWe need to make sure we select all the data, from the first column to the last column, and from the first row to the last row. If we only select part, we will only sort that part, thus scrambling the data. Remember, UNDO is a very useful function. To undo your last action, press Control + ZLet's see if we have any duplicates by email address.





WHOA! We have a dupe! What's more, this would be tricky to spot another way, we have Gollum and Smeagol, we have different postcodes, we have different addresses too!
Naturally you would now want to do something with the data at this point, but that can be considered homework. We will simply remove the row marked DUPE.
Click the row number 10 marker at the left to highlight the single row with DUPE in the I column. Right click, and choose 'Delete Rows'.
Obviously you can sort by any columns, such as postcode, company name, surname and even combinations of them. Having done any work you need to do, you will want to reset your data to it's original order.Resetting to the original order
You have probably guessed this is just another sort operation, but we will also do some tidying up. First, let's remove column I. Right click in the Column Header I, and select 'Delete Columns' from the context menu. Now let's restore the data to it's original order: Select the entire spreadsheet by clicking the top left blank square, then de-select the title row if you want, though it doesn't matter. Next select 'Data' -> 'Sort...' and choose 'Column A' for 'Sort Key 1' and click OK.
Your data is now back in it's original order. You will notice ID 2 and the row that was alongside it is now missing. That is the row we deleted earlier.
Now to delete the ID column, click the Column A letter marker at the top to highlight the single column with the numbers we created in. Right click, and choose 'Delete Columns'.





