simplehtmldom 1.9 introduced new functions to recursively remove
nodes from the DOM. This allows removing elements without the need
to re-load the document by using $html->load($html->save()), which
is very inefficient.
Find more information about remove() at
https://simplehtmldom.sourceforge.io/docs/1.9/api/simple_html_dom_node/remove/
find('*') wasn't supported in older versions of simplehtmldom but it
is supported now. Thus, all custom implementations can be replaced
by the correct solution.
This fixes the following issue:
1. bridge sets unique ids for the items (ids get hashed)
2. items go to the cache
3. on next run items get loaded from cache
4. these items have different ids because they were hashed again
5. they show up twice in feed reader
- For consistency, functions should always return null on non-existing data.
- WordPressPluginUpdateBridge appears to have used its own cache instance in the past. Obviously not used anymore.
- Since $key can be anything, the cache implementation must ensure to assign the related data reliably; most commonly by serializing and hashing the key in an appropriate way.
- Even though the default path for storage is perfectly fine, some people may want to use a different location. This is an example how a cache implementation is responsible for its requirements.
This commit adds support for a new parameter which specifies the type
of cache to use for caching. It is specified in config.ini.php:
[cache]
type = "..."
Currently only one type of cache is supported (see /caches). All uses
of 'FileCache' were replaced by this configuration option.
Note: Caching currently depends on files and folders (due to FileCache).
Experience may vary depending on the selected cache type. For now always
check if FileCache is working before testing alternative types.
References #1000
'uid' represents the unique id for a feed item. This item is null by
default and can be set to any string value. The provided string value
is always hashed to sha1 to make it the same length in all cases.
References #977, #1005
Add transformation from legacy items to FeedItems, before transforming
items to the desired format. This allows using legacy bridges alongside
bridges that return FeedItems.
As discussed in #940, instead of throwing exceptions on invalid
parameters, add messages to the debug log instead
Add support for strings to setTimestamp(). If the provided timestamp
is a string, automatically try to parse it using strtotime().
This allows bridges to simply use `$item['timestamp'] = $timestamp;`
instead of `$item['timestamp'] = strtotime($timestamp);`
Support simple_html_dom_node as input paramter for setURI
Support simple_html_dom_node as input parameter for setContent
* core: Add bridge parameter auto detection
This adds a new 'detect' action which accepts a URL from which an
appropriate bridge is selected and relevant parameters are extracted.
The user is then automatically redirected to the selected bridge.
For example to get a feed from: https://twitter.com/search?q=%23rss-bridge
we could send a request to:
'/?action=detect&format=Atom&url=twitter.com/search%3Fq%3D%2523rss-bridge'
which would redirect to:
'/?action=display&q=%23rss-bridge&bridge=Twitter&format=Atom'.
This auto detection happens on a per-bridge basis, so a new function
'detectParameters' is added to BridgeInterface which bridges may implement.
It takes a URL for an argument and returns a list of parameters that were
extracted, or null if the URL isn't relevant for the bridge.
* [TwitterBridge] Add parameter auto detection
* [BridgeAbstract] Add generic parameter detection
This adds generic "paramater detection" for bridges that don't have any
parameters defined. If the queried URL matches the URI defined in the
bridge (ignoring https://, www. and trailing /) an emtpy list of parameters is
returned.
This commit adds a cache for 'getContents' to '/cache/server'. All
contents are cached by default (even in debug mode). If debug mode
is enabled, the cached data is overwritten on each request.
In normal mode RSS-Bridge adds the 'If-Modified-Since' header with
the timestamp from the previously cached data (if available) to the
request.
Find more information on 'If-Modified-Since' here:
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/If-Modified-Since
If the server responds with "304 Not Modified", the cached data is
returned.
If the server responds with "200 OK", the received data is written
to the cache (creates a new cache file if it doesn't exist yet).
No changes were made for all other response codes.
Servers that don't support the 'If-Modified-Since' header, will
respond with "200 OK".
For servers that respond with "304 Not Modified", the required band-
width will decrease and RSS-Bridge will responding faster.
Files in the cache are forcefully removed after 24 hours.
Notice: Only few servers actually do support 'If-Modified-Since'.
Thus, most bridges won't be affected by this change.
simple_html_dom currently doesnt support "->find('*')", which is a
known issue: https://sourceforge.net/p/simplehtmldom/bugs/157/
The solution implemented by RSS-Bridge is to find all nodes WITHOUT
a specific attribute. If the attribute is very unlikely to appear
in the DOM, this is essentially returning all nodes.
This is the meaning behind
"->find('*[!b38fd2b1fe7f4747d6b1c1254ccd055e]')"