![]() ![]() Unlike traditional data sources, GT is not a database with a set of tables, but a set of records returned as a response to a given user request. The analysis of the data quality can also be applied to different levels of the data source: i) the database, ii) the tables composing the database, and iii) the records composing the tables in the database. The evaluation presented in this paper focuses on the Data hyperdimension. Quality dimensions can be grouped into three different ‘hyperdimensions’ of the data source: i) The process, which is related to the methods to generate, assemble, describe and maintain the data ii) the data, which refers to the data itself contained in the data source and iii) the use, which is related to how the source is used. The quality of data is, according to Karr, Sanil, and Banks ( 2006), a wide multidimensional concept affecting different perspectives of the data source. This means that it is not possible, for instance, to compare SVIs from different regions because values are relative to the total number of searches in each region. Notice that this normalization depends on the particular query to GT, so it depends on the specific search, period, language and geographical area that was selected. This relative index is scaled to represent the highest popularity with an SVI value of 100. The main GT output is the Search Volume Index (SVI), which is a time series representing the evolution of the popularity of a given search. This way, it is possible to study the popularity of all searches regarding one specific category. If no term or entity is selected, the report includes all the searches that fall in that category. These can be used to filter out unrelated searches in GT reports for terms or entities. Google classifies all searches into categories, such as Finance or Sports. Using entities also avoids the problem of polysemic terms because GT identifies them by their ID in Freebase, which is a collaborative knowledge base. Since entities refer to the semantics, they are independent from which terms are used to refer to them (i.e. Terms refer to the text or keywords included in the search box.Īn entity is an abstraction to refer to a single semantic unit, such as a place, a person, an object, an event, or a concept. The searches whose popularity is reported by GT may be specified as terms, entities or categories. Reports, which include time-series data, are available for any user-selected time period, from 2004 to the present day and can also be restricted to focus on searches done in a certain language or from a specific location. Google Trends is a freely available tool developed by Google that provides reports with the popularity of searches in Google Search. Our analysis detects that GT data have some non-negligible quality issues, which are evidenced in an illustrative example. This paper addresses this gap by discussing the data quality aspects of GT following the framework proposed by Karr, Sanil, and Banks ( 2006). ![]() However, its quality as a data source has not been assessed. It is also widely applied in other applied economics topics, ranging from unemployment to tourism demand (Choi and Varian 2012 Jun, Yoo, and Choi 2018). It has demonstrated to be a good proxy for investor’s attention (Da, Engelberg, and Gao 2011), even during the COVID-19 outbreak (Shear, Ashraf, and Sadaqat 2021 Costola, Iacopini, and Santagiustina 2021). Among the non-traditional data sources, GT is one of the most widely used in the empirical economic literature. Google Trends (GT) is a tool that provides reports on the popularity of certain searches in the Google search engine. Issues with data quality, such as high measurement error, may impact on model parameter estimates and create economic inefficiencies (Bound, Brown, and Mathiowetz 2001). Data quality is a multi-dimensional concept which refers to the capability of data to be used quickly and effectively to inform and evaluate decisions. 2009) or politics (Mellon 2014) to finance (Preis, Moat, and Stanley 2013).ĭespite its increasing use in the literature, the quality of these non-traditional data sources has been largely overlooked. Such online data include sources such as social networking sites, corporate websites, and search engines, which have been used in a wide variety of research topics ranging from medicine (Pelat et al. The rise in popularity of digital media has brought an enormous growth in the number of data sources related to the digital footprint left by businesses and consumers (Blazquez and Domenech 2018). ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |