Title: | Unofficial API for Fedstat (Rosstat EMISS System) for Automatic and Efficient Data Queries |
---|---|
Description: | An API for automatic data queries to the fedstat <https://www.fedstat.ru>, using a small set of functions with a common interface. |
Authors: | Denis Krylov [aut, cre], Dmitry Kibalnikov [aut] |
Maintainer: | Denis Krylov <[email protected]> |
License: | MIT + file LICENSE |
Version: | 1.0.3 |
Built: | 2025-03-11 05:31:32 UTC |
Source: | https://github.com/denchpokepon/fedstatapir |
data_ids
based on filters
that are given in JSON formatFilters indicator data_ids
with given filters
taking into account possible filters
specification errors and default filters.
filters
should use filter_field_title
in names and filter_value_title
in values as
they are presented on fedstat.ru. If for some reason the specified filters
do not return the expected result, it is worth inspecting possible
filter values in data_ids
to see if the strings are defined correctly
(e.g. encoding issues, mixing latin and cyrillic symbols)
filter_value_title
currently supports the following special values:
asterix (*), it's alias for "select all possible filter values for this filter field"
Unspecified filters use asterix as a default (i.e. all possible filter values are selected and a warning is given)
Internally normalized filter_field_title
and filter_value_title
are
used (all lowercase, removed extra whitespaces)
to compare the equality of data_ids
and filters
fedstat_data_ids_filter(data_ids, filters = list(), disable_warnings = FALSE)
fedstat_data_ids_filter(data_ids, filters = list(), disable_warnings = FALSE)
data_ids |
data.frame, result of |
filters |
JSON in R list form. The structure should be like this: { "filter_field_title1": ["filter_value_title1", "filter_value_title2"], "filter_field_title2": ["filter_value_title1", "filter_value_title2"], ... } Where for example |
disable_warnings |
bool, enables or disables following warnings:
|
data.frame, filtered data_ids
fedstat_get_data_ids,
fedstat_post_data_ids_filtered
## Not run: # Get data filters identificators for CPI # filter the data_ids to get data for january of 2023 # for all goods and services for Russian Federation data_ids_filtered <- fedstat_get_data_ids("31074") %>% fedstat_data_ids_filter( filters = list( "Territory" = "Russian Federation", "Year" = "2023", "Period" = "January", "Types of goods and services" = "*" ) ) # Not actual filter field titles and filter values titles because of ASCII requirement for CRAN ## End(Not run)
## Not run: # Get data filters identificators for CPI # filter the data_ids to get data for january of 2023 # for all goods and services for Russian Federation data_ids_filtered <- fedstat_get_data_ids("31074") %>% fedstat_data_ids_filter( filters = list( "Territory" = "Russian Federation", "Year" = "2023", "Period" = "January", "Types of goods and services" = "*" ) ) # Not actual filter field titles and filter values titles because of ASCII requirement for CRAN ## End(Not run)
This function is a wrapper for the other functions of the package to provide a simple one function API for fedstat.ru
There are two basic terms in this API: filter_field
and filter_value
The filter field
reflects the individual property of the data point.
For example, Year, Region, Unit of measurement, etc.
Each filter field has its own title (filter_field_title
),
it is simply a human-readable word or phrase (e.g. "Year", "Region")
that reflects the essence of the property by which filtering takes place
The filter value
reflects the individual property specific value of the data point.
(e.g. 2021 for the Year, "Russian Federation" for the region, etc.)
It also has a title (filter_value_title
) with
the same purpose as filter_field_title
filters
should use filter_field_title
in names and filter_value_title
in values as
they are presented on fedstat.ru. If for some reason the specified filters
do not return the expected result, it is worth using
fedstat_get_data_ids
separately and inspecting possible
filter values in data_ids
to see if the strings are defined correctly
(e.g. encoding issues, mixing latin and cyrillic symbols)
filter_value_title
currently supports the following special values:
asterix (*), alias for "select all possible filter values for this filter field"
Unspecified filters use asterix as a default (i.e. all possible filter values are selected and a warning is given)
Internally normalized filter_field_title
and filter_value_title
are
used (all lowercase, removed extra whitespaces)
to compare the equality of data_ids
and filters
fedstat_data_load_with_filters( indicator_id, ..., filters = list(), timeout_seconds = 180, retry_max_times = 3, disable_warnings = FALSE, httr_verbose = NULL, loading_steps_verbose = TRUE, return_type = c("data", "dictionary"), try_to_parse_ObsValue = TRUE )
fedstat_data_load_with_filters( indicator_id, ..., filters = list(), timeout_seconds = 180, retry_max_times = 3, disable_warnings = FALSE, httr_verbose = NULL, loading_steps_verbose = TRUE, return_type = c("data", "dictionary"), try_to_parse_ObsValue = TRUE )
indicator_id |
character, indicator id/code from indicator URL. For example for indicator with URL https://www.fedstat.ru/indicator/37426 indicator id will be 37426 |
... |
other arguments passed to httr::GET and httr::POST |
filters |
JSON in R list form. The structure should be like this: { "filter_field_title1": ["filter_value_title1", "filter_value_title2"], "filter_field_title2": ["filter_value_title1", "filter_value_title2"], ... } Where for example |
timeout_seconds |
numeric, maximum time before a new GET and POST request is tried |
retry_max_times |
numeric, maximum number of tries to GET and POST |
disable_warnings |
bool, enables or disables following warnings:
|
httr_verbose |
|
loading_steps_verbose |
logical, print data loading steps to console |
return_type |
character, "data" or "dicionary", data for actual data, dictionary for sdmx lookup table (full data codes dictionary) |
try_to_parse_ObsValue |
logical, try to parse ObsValue column from character to R numeric type |
data.frame with filtered indicator data from fedstat.ru
fedstat_get_data_ids,
fedstat_data_ids_filter,
fedstat_post_data_ids_filtered,
fedstat_parse_sdmx_to_table
## Not run: # Download CPI data # for all goods and services for Russian Federation data <- fedstat_data_load_with_filters( indicator_id = "31074", filters = list( "Territory" = "Russian Federation", "Year" = "2023", "Period" = "January", "Types of goods and services" = "*" ) ) # Not actual filter field titles and filter values titles because of ASCII requirement for CRAN ## End(Not run)
## Not run: # Download CPI data # for all goods and services for Russian Federation data <- fedstat_data_load_with_filters( indicator_id = "31074", filters = list( "Territory" = "Russian Federation", "Year" = "2023", "Period" = "January", "Types of goods and services" = "*" ) ) # Not actual filter field titles and filter values titles because of ASCII requirement for CRAN ## End(Not run)
To query data from fedstat we need to POST some filters in form of filter numeric identificators. Most filters don't have some rule from which their ids can be generated based on filters titles and values. It seems like these ids are just indexes in the fedstat inner database. So in order to get the data, we first need to get the ids of the filter values by parsing specific part of java script source code on indicator web page.
fedstat_get_data_ids( indicator_id, ..., timeout_seconds = 180, retry_max_times = 3, httr_verbose = NULL )
fedstat_get_data_ids( indicator_id, ..., timeout_seconds = 180, retry_max_times = 3, httr_verbose = NULL )
indicator_id |
character, indicator id/code from indicator URL. For example for indicator with URL https://www.fedstat.ru/indicator/37426 indicator id will be 37426 |
... |
other arguments passed to httr::GET |
timeout_seconds |
numeric, maximum time before a new GET request is tried |
retry_max_times |
numeric, maximum number of tries to GET |
httr_verbose |
|
It is known that the fedstat lags quite often. Sometimes site never responds at all. This is especially true for the most popular indicators web pages. In this regard, by default, a GET request is sent 3 times with a timeout of 180 seconds and with initially small, but growing exponentially, pauses between requests.
As a rule, requests to the indicator web page take much longer than requests
to get the data itself. A POST request for data is sent to a single
URL https://www.fedstat.ru/indicator/data.do?format=(excel or sdmx)
for all indicators and is often quite fast. In this regard, for many indicators,
it makes sense to cache data_ids
to increase the speed of data download.
This is not possible for all data, for example, for weekly prices,
each new week adds a new filter (new week), the id of which can only be found on the indicator web page.
But for most data (e.g. monthly frequency), time filters are trivial.
There are 12 months in total with unique ids that do not change
and year ids that match their values
(that is, filter_value_id
= filter_value
, in other words 2020 = 2020)
Correct filter_field_object_ids are needed to get data. For the sdmx format, these ids do not change anything, except for the standard data sorting, but their incorrect specification will lead either to incomplete data loading or to no data at all. For the excel format, these ids determine the form of data presentation, as in the data preview on the fedstat site. For now only default filter_field_object_ids are used, which are parsed from java script source code on indicator web page. Users can specify filter_field_object_ids for each filter_field in resulting data_ids table.
data.frame with all character type columns:
filter_field_id - id for filter field;
filter_field_title - filter field title string representation;
filter_value_id - id for filter field value;
filter_value_title - filter field value title string representation;
filter_field_object_ids - special strings that define the location of the filters fields. It can take the following values: lineObjectIds (filters in lines), columnObjectIds (filters in columns), filterObjectIds (hidden filters for all data);
fedstat_data_ids_filter,
fedstat_post_data_ids_filtered
## Not run: # Get data filters identificators for CPI data_ids <- fedstat_get_data_ids("31074") ## End(Not run)
## Not run: # Get data filters identificators for CPI data_ids <- fedstat_get_data_ids("31074") ## End(Not run)
Download indicator information from https://www.fedstat.ru/organizations/
Result table contains fedstat indicator id which is needed to request fedstat data
Indicator with condition hidden == TRUE
shows disabled records in fedstat hence ones might not be requested
fedstat_indicator_info()
fedstat_indicator_info()
data.frame
## Not run: Get all indicator info get_indicators() ## End(Not run)
## Not run: Get all indicator info get_indicators() ## End(Not run)
Allows researchers to search for interesting indicators more easily
fedstat_indicators_names_database
fedstat_indicators_names_database
A data frame with 9335 rows and 10 variables:
indicator name
indicator url
boolean, TRUE if indicator is not used and updated anymore
the name of the department from which the data is coming from
grouping of indicator
grouping of indicator
grouping of indicator
grouping of indicator
grouping of indicator
date of the last update of the current database
https://fedstat.ru/organizations/
Parses sdmx raw bytes received in response to POST request.
This function is a wrapper around readsdmx::read_sdmx
,
in addition to reading data, automatically adds columns with values from lookup tables.
Can also return full data codes dictionary for the indicator
fedstat_parse_sdmx_to_table( data_raw, return_type = c("data", "dictionary"), try_to_parse_ObsValue = TRUE )
fedstat_parse_sdmx_to_table( data_raw, return_type = c("data", "dictionary"), try_to_parse_ObsValue = TRUE )
data_raw |
sdmx raw bytes |
return_type |
character, "data" or "dicionary", data for actual data, dictionary for sdmx lookup table (full data codes dictionary) |
try_to_parse_ObsValue |
logical, try to parse ObsValue column from character to R numeric type |
data.frame
## Not run: # Get data filters identificators for CPI # filter the data_ids to get data for january of 2023 # for all goods and services for Russian Federation # POST filters and download data in sdmx format # Parse raw sdmx to data.frame data <- fedstat_get_data_ids("31074") %>% fedstat_data_ids_filter( filters = list( "Territory" = "Russian Federation", "Year" = "2023", "Period" = "January", "Types of goods and services" = "*" ) ) %>% fedstat_post_data_ids_filtered() %>% fedstat_parse_sdmx_to_table() # Not actual filter field titles and filter values titles because of ASCII requirement for CRAN ## End(Not run)
## Not run: # Get data filters identificators for CPI # filter the data_ids to get data for january of 2023 # for all goods and services for Russian Federation # POST filters and download data in sdmx format # Parse raw sdmx to data.frame data <- fedstat_get_data_ids("31074") %>% fedstat_data_ids_filter( filters = list( "Territory" = "Russian Federation", "Year" = "2023", "Period" = "January", "Types of goods and services" = "*" ) ) %>% fedstat_post_data_ids_filtered() %>% fedstat_parse_sdmx_to_table() # Not actual filter field titles and filter values titles because of ASCII requirement for CRAN ## End(Not run)
Creates a request body from data_ids
and sends it to
https://www.fedstat.ru/indicator/data.do?format=data_format
.
Gets an sdmx or excel with data in binary format.
sdmx raw bytes can be passed to fedstat_parse_sdmx_to_table
to create a
data.frame or to rawToChar
and writeLines
to create an xml file
excel raw bytes can be passed to writeBin
to create an xls file
fedstat_post_data_ids_filtered( data_ids, ..., data_format = c("sdmx", "excel"), timeout_seconds = 180, retry_max_times = 3, httr_verbose = NULL )
fedstat_post_data_ids_filtered( data_ids, ..., data_format = c("sdmx", "excel"), timeout_seconds = 180, retry_max_times = 3, httr_verbose = NULL )
data_ids |
data.frame, can be a result of |
... |
other arguments passed to httr::POST |
data_format |
string, one of sdmx, excel |
timeout_seconds |
numeric, maximum time before a new POST request is tried |
retry_max_times |
numeric, maximum number of tries to POST |
httr_verbose |
|
raw bytes (sdmx or excel)
## Not run: # Get data filters identificators for CPI # filter the data_ids to get data for january of 2023 # for all goods and services for Russian Federation # POST filters and download data in sdmx format data <- fedstat_get_data_ids("31074") %>% fedstat_data_ids_filter( filters = list( "Territory" = "Russian Federation", "Year" = "2023", "Period" = "January", "Types of goods and services" = "*" ) ) %>% fedstat_post_data_ids_filtered() # Not actual filter field titles and filter values titles because of ASCII requirement for CRAN ## End(Not run)
## Not run: # Get data filters identificators for CPI # filter the data_ids to get data for january of 2023 # for all goods and services for Russian Federation # POST filters and download data in sdmx format data <- fedstat_get_data_ids("31074") %>% fedstat_data_ids_filter( filters = list( "Territory" = "Russian Federation", "Year" = "2023", "Period" = "January", "Types of goods and services" = "*" ) ) %>% fedstat_post_data_ids_filtered() # Not actual filter field titles and filter values titles because of ASCII requirement for CRAN ## End(Not run)
Get data ids from java script source
parse_js1(script)
parse_js1(script)
script |
character, java script source code with data ids |
json in form of list with data ids
Get default data ids object ids from java script source
parse_js2(script)
parse_js2(script)
script |
character, java script source code with data ids and default object ids in it |
json in form of list with 3 character vectors for
lineObjectIds, columnObjectIds, filterObjectIds, which consist of filters_id