1. Reading datafile
By executing the following script, we can read all data in the JSON file.
# Read libraries import json import pandas as pd # Load a JSON file and store in a dictionary f=open('JSON_RDB_test.json','r') dict_all=json.load(f) # Create pandas tables (DataFrames) from the dictionary df_rawdata=pd.DataFrame(dict_all["rawdata"]) df_paper=pd.DataFrame(dict_all["paper"]) df_figure=pd.DataFrame(dict_all["figure"]) df_sample=pd.DataFrame(dict_all["sample"]) df_property=pd.DataFrame(dict_all["property"])
Let’s see the contents of each tables:
3.1.1. rawdata
The rawdata table contains each data points. For each datapoint (x, y), the original paper, figure, sample, and the physical properties of the x and y axes are identified by the IDs. The values are expressed in the default units, defined in the property table.
1.2. property
The property table shows the information about the physical properties that appear in this datafile. It also shows the default units (SI unit without prefixes) of the properties.
1.3. paper
The paper table contains bibliographic information of each paper. The paperid is identical to the Starrydata ID (SID) in Starrydata web system.
1.4. figure
The figure table contains the information about the original figure for the data. The paperid indicates the original paper, and the figurename shows the name of the figure in the original paper.
Each figure is meant to have only one set of axes (a pair of propertyid_x and properytid_y). When one figure contains multiple sets of axes, they are given different figureids with the same figurename.
1.5. sample
The sample table contains the information of the material samples introduced in the paper. Usually, a sample appears in several figures. Samplename shows the name of the sample in the paper. Composition shows the possible chemical composition of the sample, presumed by the data collector by reading the text. When both the starting composition and the analytical composition are available, starting composition were preferred. The composition are shown in molar ratio.
2. Creating original tables
By using the features of pandas, you can make original tables from the above tables, like in relational databases.
2.1. Extract columns
You may feel that our paper table is too big to handle. In such case, you can specify the columns that you need, when creating the DataFrame. You can also use this technique to reorder the columns in the table.
2.2. Merge tables
You can include the information in the other tables, by merging two tables. For example, you can include the information of the paper table in the sample table, by using paperid as the key to merge the two tables. Here, for each row in the sample table, the program looked up the paper table to pick the information about each paperid.
If you don’t want every column in the merged table, you can define a new DataFrame from the merged table and pick the necessary columns, as following.
3. Extract datafiles from the rawdata
The rawdata table is a collection of all datasets in the list of paper. However, in usual chart-plotting softwares, we need a text file for each dataset.
3.1. Extract a datafile
You can pick up rawdata that have specific figureid and sampleid, as follows.
To save this x and y in a text file, output a text file as follows.
This outputs a text file as below.
3.3.2. Extract all datafiles
Here we show an example script to output CSV files for every dataset on temperature dependence of Seebeck coefficient, from the datafile. The output files are named like 18560__Fig4__Ba8Al16Ga2Si26P2__S-T.csv, by using paperid, figurename, and sample composition, and stored in the directory data/clathrate.