Workflow
Jump to navigation
Jump to search
Steps based on the metadata in JSON
Prepare keywords (done)
- Extract set of keywords ("keywords.csv")
- Add them to the Wikibase using Quick Statements
- Export keywords with QID and Label
- Match keywords and QIDs in the CSV file
Affiliations (done)
- Extract set of affilations (don't merge near identical strings!)
- Add affiliations to Wikibase
- Len = String
- Den = "An institution"
- Aen = String of label without "University", "of", "the", "The", "College", "at", double spaces
- P1 = Q14 (instance of institution)
- P20 = """exact string"""
- Export affiliations with QID, Label and P20 (for matching)
Authors (done)
- Extract set of authors, and identify first names, last names, additional name elements, affiliations, in CSV file ("authors.csv")
- Match affiliation QIDs and affiliation strings in the CSV file;
- Add authors with first names, last name, additional name elements, affiliation-QIDs
- Match author QIDs to author strings; this also gives us the author QIDs relevant to each article
- TODO: Apply manual fixes for authors who appear multiple times, sometimes with changing affilations
Add basic article information (done)
- Add articles with
- Len = title
- Den = "Article in STCCL"
- Aen = Article ID as String
- P1 = Q10 ("instance of" = Article)
- Verify that all articles are there: all 685 are there (or one missing because of double Solshenizin?)
- Export QIDs and Article IDs for further matching using query
Authors for articles (done)
- Based on a match between the QID of each article
- qid,P16
Add keywords to articles (done)
- Match keyword QID from Wikibase to keywords strings from JSON metadata
- P15 = Keywords (Items) – multiple statements per article, one for each keyword (varying numbers)
Some further article information (in progress)
- Build table with the following and import using QS
- P11 = Year (EDTF)
- P12 = DOI (URL)
Add abstracts to articles
- Use QID-ArticleID table to match articles in JSON using ID to Wikibase items via QID
- Add abstracts (PROBLEM: field limit stays at 400 chars!!)
- P2 = Abstract as String (more than 400 chars in 87% of the case!)
Optional: Further article information
- Build table with the following and import using QS
- P4 = HTML (URL)
- P5 = PDF (URL)
- P14 = Volume (String)
- P10 = Issue (P10)
- P9 = Number (String)
- P18 = Q1 (Journal = STTCL)
Some fixes
- Merge near-identical institution names? (see e.g.: Berkeley or Boulder)
- Fix keywords that could not be matched automatically, see links to: Item:Q8186
- Remove some items: Q7542 (Editor's note), Q8182 (Interview), Q7827 (poems?), Q7960, Q7961, Q7964, Q7965, Q7966, Q8067, Q8067, Q8105, Q8141
- Check some items with short abstracts, and remove as required: Q7707, Q7729, Q7828, Q7830, Q7833, Q7916, Q7999, Q8000, Q8001, Q8002, Q8003, Q8004, Q8005, Q8006, Q8007, Q8008
- Add 39 "special topic" articles as articles.
- Take care of missing affiliations / change to "independent scholar" as appropriate: https://tp3-sttcl.wikibase.cloud/wiki/Special:WhatLinksHere/Item:Q6732
Potential improvements
- Possibly: Improve keywords through corrections (e.g. "hisory", "198S") and merging (e.g. capitalization) – or keep as found?
- Add a "point in time" reference to institutional affilations based on publication data of relevant article(s) to better represent the record (e.g. Avital Ronnell) or to disambiguate multiple affiliations for one author
- Match institutions to some identifier (ROR, Wikidata ID, etc.) and then pull city, country, world region, geocoordinates
- Match authors to some identifier (ORCiD, Wikidata, GND, etc.)
- Match keywords to some identifier (Wikidata, GND, Dewey, etc.)
Further steps
- Add category annotations using Cradle form
- Add generated categories
- Add generated topics?