1. What is "open data"

Open data (and content) can be freely used, modified, and shared by anyone for any purpose (http://opendefinition.org/).
"Cohesion Open Data" is maintained by the European Commission to share unique, structured data generated from the monitoring of EU Budget funded, shared management programmes.  The unique data relates primarily to different forms of financial inputs (planned and implemented), EU payment data and to outputs and result indicators of the physical results of the funded programmes.
Find out more about #CohesionOpenData:  Website Blog FAQ User Guide to ODP - User guide to visualisationBrowse data stories 


2. How to access the platform

Also accessible from Inforegio under "Information sources" – "Cohesion Open Data"


3. How to access the data

The data behind the ESIF visualizations can be accessed and downloaded from the platform in two ways:

1. Use the "Explore this Data" link associated to the charts presented on the platform.


2. If you know which dataset you want to manipulate or are searching for specific data or charts, search in the “Catalogue” link found on the home page =https://cohesiondata.ec.europa.eu/browse


The catalogue gives access to a library of open data objects with a search tool with various filters.
The following search functions can be used cumulatively:
  • Free search (1);
  • "Categories" tags (2) - programme periods, dataset types; 
  • "View types" tags (3) - the main types used are datasets, charts, filtered views and stories;
  • Word "Tags" (4);
  • "Sort by" tags (5).

4. What is a Metadata page?

Each dataset has a metadata page with its own URL. It introduces users to the dataset and gives important information about the dataset as follows:
  • Data description;
  • Date of last update;
  • Links to featured content, i.e. graphs created by the data owner;
  • Key details about the dataset: Usage, owner, contact, licence, etc.;
  • Structure of the data;
  • Related content (i.e. graphs created by other users).
Before you start reusing the data it is essential to understand what the data relates to and how it is structured in order to make useful interpretations and charts. The metadata is there for that purpose. For instance the metadata highlights important characteristics as follows:
  • In all cases for 2014-2020 the values are expressed cumulatively not annually. Some other datasets have annualised values;
  • The data may be presented in annual snapshots. In those cases you can compare annual timeseries, but never aggregated the values from differnent years;
  • Financial values are always expressed in current prices.

5. Create a profile to fully exploit the platform functionalities

Create a profile with you own password by enrolling on this site: https://cohesiondata.ec.europa.eu/login
With a profile you can filter, create and save objects on the platform for reuse by you and other users.


6. Interacting with and reusing the data

Once you “view data” from the metadata page the platform offers several ways to interact with the data.

You can for instance filter a dataset to keep just one country, one Thematic Objective, one fund; you can create charts or other types of visualizations to illustrate a report or to embed on a webpage.
All the functionalities are accessible in the toolbars indicated below. These are explained below. 

6.1 More views

All the filtered views or visualizations created from a dataset will be listed here.

6.2 Filtering / filtered views

This tool works like the pivot table in Excel. You must first define the filter conditions.

  • Users can add filters based on column titles in the dataset => "fund", "dimension_type";
  • Choose the condition to be applied => "is one of", "is not";
  • Filter the elements to which the contion is applied => "ERDF", "Intervention Field";
  • Remember to "Apply" the filters to the dataset;
  • Export the filtered dataset.


6.3 Exporting and reusing data 

Once you “view data” from the metadata page the platform offers several ways to export and use the data.
6.3.1 Static export 

A static export is a one off download of the data for local use. Once you download the data you have captured a shapshot and the local copy will not update. 
From the metadata page the "Export" button opens a pop up box with download options. These include choices of the export format. 


When exporting data file there are two typical challenges that users face if using MS Excel:
  1. The number and financial amounts with commas and decimal formatting are not accurately imported into Excel giving false results when checked against reference values on the platform: => See Annex 1 to correct set your default excel settings for numbers and financial amounts;
  2. The .csv file is not formatted in columns: => see Annex 2 to see how to reformat the .csv format in Excel columns and cells.
Consider using the OData. It is a permanent link between the platform and the excel file. It can be used to avoid regular manual downloads. Refreshing the data will automatically update on your excel.
1. Open via Dataset details or the table view.
Click Access Data via OData



2. Copy the OData Endpoint link


3. Open a new Excel document
4. Click on the Data tab on the top panel, then choose New Query and From Other Sources, select From OData Data Feed


5. Paste the copied OData Endpoint link and then click OK


6. You will see a preview of the table, click Load to insert it


The table should look similar to this.
Note: Column A lists unique identifiers for each row. The column order may be rearranged.
When starting work on the dataset, click Refresh here to update your table and sync with the source data.
6.3.3 Reuse data using a SODA API interconnector
Each published dataset on the platform  is provided with an application programming interface (API) to allow direct access to the data from a variety of programming softwares. 
IT developers can use the Socrata developer portal and the API documents to programmatically reuse the data directly to  build filtered views and visualisations.   
6.3.4 Connecting to cohesion open data using Stata
Requirements: 
  • Stata software
  • Active internet connection 
  • Enable Stata to access internet 
Process: 

First, check the internet connection of your Stata installation. You can do this by reviewing the system settings or running commands such as update or view "https://commission.europa.eu/index_en".
Note, working from within institutions such as the European Commission you may be required to set a proxy, port etc. 
Second, be mindful of your operating system’s and Stata-specific special characters. For example, using Stata on Windows requires to escape $ with a \ when using it in a context such as an URL. 
The final code to use to access the dataset e4v6-qrrq (https://cohesiondata.ec.europa.eu/2014-2020/ESIF-2014-2020-FINANCES-PLANNED-DETAILS/e4v6-qrrq) when operating in a context requiring setting a proxy, port, and user identification to access websites is:
 
loc un <username in browser>
loc pw <password in browser> 
loc proxy <proxy_address_by_your_institution> 
loc port <port_by_your_institution > 
loc file <file_name_you_want_to_assign>
loc limit <limit_of_data_such_as_number_of_rows> 
shell curl --proxy `un':`pw'@`proxy':`port' https://cohesiondata.ec.europa.eu/resource/e4v6-qrrq.csv?\$limit=`limit' > `file'.csv

All lines are to be run at once. You can also write all information in a single line. The code only specifies the limit parameter though other queries should function in a similar fashion. 
The command ultimately opens a small popup window to operate outside of Stata (using a shell in Stata to call curl from the terminal), access the files, and store them at a location of your choice. Mind to go against intuition and omit double or single quotations.

Potential pitfalls when pursuing other approaches: 
Running import delimited "https://cohesiondata.ec.europa.eu/resource/e4v6-qrrq.csv" users can access the data but may be  limited to the first 1 000 rows. If using any of the provided API endpoints (so following the URL with “?$query=SELECT”…) then users can be faced with various errors (r(603) or r(672); cf. https://www.stata.com/support/faqs/web/common-connection-error-messages). 
Setting the limit parameter (as suggested here: https://support.socrata.com/hc/en-us/articles/202949268-How-to-query-more-than-1000-rows-of-a-dataset) with the following code https://cohesiondata.ec.europa.eu/resource/e4v6-qrrq.csv?$limit=50000 may not be successful due to the missing escape of the $ symbol. However, this approach is helpful for when accessing the data via PowerBI. 
 

6.4 Visualizing the data 

The platform provides the possibility to make a range of interactive charts and visuals, without IT programming knowledge. 
This separate data story provides a guide to building #CohesionOpenData visuals to share and reuse. It provides detailed instructions how to create charts as follows:

6.5 Embed data, filtered views and charts 

Embedding allows the charts to be published on webpages by providing an “embed code”. Choose the size or customize it, use the "Preview" button and copy the embed code.

6.6 Share

You have the possibility to share any dataset, views or chart via social media.

6.7 About = Metadata

This link gives you access to information on the data description, usage, owner, contact and last update.


ANNEX 1: Setting the correct number format to open and read an exported “.CSV (for Excel)” file


In WINDOWS10:
  1. Open windows settings (in Windows not in Excel)
  2. Then open regional settings
  3. Then select 'additional date, time & regional settings'
  4. Then select 'change date, time or number formats'


5. In the next pop-up window, press button "Additional settings"


6. In the next pop-up window, change the following settings;
7. Decimal symbol: dot '.'
8. Digit grouping symbol: comma ','
9. List separator: comma ','
10. Apply


ANNEX 2: Reformatting an exported “.CSV (for Excel)” file to display data in cell


Is your data exported in “CSV for Excel” format presenting strangely with data rows all grouped in the first cell of each row?
Follow these step in Excel:
  1. Select the Column A
  2. In the menu select “Data” => “Text to Columns”






3. In the task window – Step 1 of 3:
  • Choose "Delimited"
  • Take note of which character is used to delimit the data elements - either “;” or “,”
  • Click Next



4. In the task window (Step 2 of 3):
  • Choose either “Semicolon” or “Comma” depending on the format of the source data
  • That should now show you column formatted data in the Data Preview
  • Click “Finish” to see the dataset reformatted in an excel readable format


5. The data should now look like this. 
6. Save the file in an ".xls" or other Excel format NOT as ".csv"




RESOURCES AND CONTACTS

DG REGIO: Find out more about #CohesionOpenData:  Website Blog FAQ User Guide to ODP - User guide to visualisation - Browse data stories
Tylertech/Socrata:
Date of text:  April 2022
Authors: Iluta BADERE, John WALSH